I use flex to make a lexical analyzer. I want to analyse some define compiler statements which are in the form: #define identifier identifier_string. I keep a list of (identifier identifier_string) pair. So when I reach in the file a identifier that is #define list I need to switch the lexical analysis from main file to analyse the corresponding identifier_string.
(I don't put the complete flex code because is too big)
here's the part:
{IDENTIFIER} { // search if the identifier is in list
if( p = get_identifier_string(yytext) )
{
puts("DEFINE MATCHED");
yypush_buffer_state(yy_scan_string(p));
}
else//if not in list just print the identifier
{
printf("IDENTIFIER %s\n",yytext);
}
}
<<EOF>> {
puts("EOF reached");
yypop_buffer_state();
if ( !YY_CURRENT_BUFFER )
{
yyterminate();
}
loop_detection = 0;
}
The analysis of the identifier_string executes just fine. Now when the EOF is reached I want to switch back at the initial buffer and resume the analysis. But it finishes just printing EOF reached.
Although that approach seems logical, it won't work because yy_scan_string replaces the current buffer, and that happens before the call to yypush_buffer_state. Consequently, the original buffer is lost, and when yypop_buffer_state is called, the restored buffer state is the (now terminated) string buffer.
So you need a little hack: first, duplicate the current buffer state onto the stack, and then switch to the new string buffer:
/* Was: yypush_buffer_state(yy_scan_string(p)); */
yypush_buffer_state(YY_CURRENT_BUFFER);
yy_scan_string(p);
Related
I want to parse output from a commandline tool using the fsm programming model. What is the simplest implementation of a fsm that is possible for this task?
Basically, the core idea of a finite state machine is that the machine is in a "state" and, for every state, the behaviour of the machine is different from other states.
A simple way to do this is to have an integer variable (or an enum) which stores the status, and a switch() statement which implements, for every case, the required logic.
Suppose you have a file of the followin kind:
something
begin
something
something2
end
something
and you duty is to print the part between begin/end. You read the file line by line, and switch state basing on the content of the line:
// pseudo-C code
enum state {nothing, inblock};
enum state status;
string line;
status = nothing;
while (!eof(file)) {
readline(line);
switch(status) {
case nothing:
if (line == "begin") status=inblock;
break;
case inblock:
if (line == "end")
status=nothing;
else print(line);
break;
}
}
In this example, only the core idea is shown: a "status" of the machine and a mean to change status (the "line" read from file). In real life examples probably there are more variables to keep more informations for every state and, perhaps, the "status" of the machine can be stored in a function pointer, to avoid the burden and rigidity of the switch() statement but, even so, the programming paradigm is clean and powerful.
The fsm model works in C by assigning function pointers to certain functions that have to process certain data. One good use for fsms is for parsing commandline arguments, for parsing captured output.... The function pointer is assigned to a preset starting function. The start function assigns the function pointer, which must be passed along, to the appropriate next function. And that decides the next function and so on.
Here is a very simple implementation of a fsm:
struct _fsm
{
void (*ptr_to_fsm)(struct _fsm fsm);
char *data;
}
struct _fsm fsm;
fsm->ptr_to_fsm = start; // There is a function called start.
while (fsm->ptr_to_fsm != NULL)
{
fsm->ptr_to_fsm(&fsm);
}
void start (struct _fsm fsm)
{
if (fsm->data == NULL)
{
fsm->ptr_to_fsm = stop; // There is a function called stop.
}
/* Check more more conditions, and branch out on other functions based on the results. */
return;
}
void stop (struct _fsm fsm)
{
fsm->ptr_to_fsm = NULL; /* The while loop will terminate. */
/* And you're done (unless you have to do free`ing. */
}
I'm trying to add strings to a Binary Search Tree using a recursive insert method (the usual for BSTs, IIRC) so I can later print them out using recursion as well.
Trouble is, I've been getting a segmentation faults I don't really understand. Related code follows (this block of code is from my main function):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Stores the size of the C-strings we will use;
// Standardized to 100 (assignment specifications say
// ALL strings will be no more than 100 characters long)
// Please note that I defined this as a preprocessor
// directive because using the const keyword makes it
// impossible to define the size of the C-String array
// (C doesn't allow for static array struct members whose
// size is given as a variable)
#define STRING_SIZE 100
// The flags for case sensitivity
// and an output file
int cflag = 0, oflag = 0;
// These are intended to represent the boolean
// values true and false (there's no native bool
// data type in C, apparently)
const int TRUE = 1;
const int FALSE = 0;
// Type alias for the bool type
typedef int bool;
// This is the BST struct. A BST is basically just
// a Node with at most two children (left and right)
// and a data element.
typedef struct BST
{
struct BST *left;
struct BST *right;
char *key;
int counter;
} BST;
// ----- FUNCTION PROTOTYPES -----
void insert(BST **root, char *key);
int caseSenStrCmp(char *str1, char *str2);
int caseInsenStrCmp(char *str1, char *str2);
bool existsInTree(BST *root, char *key, int cflag);
void inOrderPrint(BST *root, int oflag, FILE *outFile);
void deallocateTree(BST *root);
int main(int argc, char **argv) {
extern char *optarg;
extern int optind;
int c, err = 0;
// Holds the current line in the file/user-provided string.
char currentLine[STRING_SIZE];
// This will store the input/output file
// directories
char fileDirectory[STRING_SIZE];
static char usage[] = "Usage: %s [-c] [-o output_file_name] [input_file_name]\n";
while ((c = getopt(argc, argv, "co:")) != -1)
switch (c)
{
case 'c':
cflag = 1;
break;
case 'o':
oflag = 1;
// If an output file name
// was entered, copy it
// to fileDirectory
if (argv[optind] != NULL)
{
strcpy(fileDirectory, argv[optind]);
}
break;
case '?':
err = 1;
break;
default:
err = 1;
break;
}
if (err)
{
// Generic error message
printf("ERROR: Invalid input.\n");
fprintf(stderr, usage, argv[0]);
exit(1);
}
// --- BST SORT CODE STARTS HERE ---
printf("This is BEFORE setting root to NULL\n");
// This is the BST. As the assignment instructions
// specify, it is initially set to NULL
BST *root = NULL;
// Pointer to the mode the files
// will be opened in. Starts as
// "w" since we're opening the output file
// first
printf("This is AFTER setting root to NULL\n");
char *mode = (char*)malloc(sizeof(char*));
strcpy(mode, "w");
printf("Wrote w to mode pointer");
// Pointer to the output file
FILE *outFile;
// Attempt to open output file
outFile = fopen(fileDirectory, mode);
printf("Opened outfile \n");
// Now update mode and fileDirectory so
// we can open the INPUT file
strcpy(mode, "r");
printf("Wrote r to mode\n");
// Check if we have an input file name
// If argv[optind] isn't NULL, that means we have
// an input file name, so copy it into fileDirectory
if (argv[optind] != NULL)
{
strcpy(fileDirectory, argv[optind]);
}
printf("Wrote input file name to fileDirectory.\n");
// Pointer to the input file
FILE *inFile;
// Attempt to open the input file
//printf("%d", inFile = fopen(fileDirectory, mode));
printf("Opened input file\n");
// If the input file was opened successfully, process it
if (inFile != NULL)
{
// Process the file while EOF isn't
// returned
while (!feof(inFile))
{
// Get a single line (one string)
//fgets(currentLine, STRING_SIZE, inFile);
printf("Wrote to currentLine; it now contains: %s\n", currentLine);
// Check whether the line is an empty line
if (*currentLine != '\n')
{
// If the string isn't an empty line, call
// the insert function
printf("currentLine wasn't the NULL CHAR");
insert(&root, currentLine);
}
}
// At this point, we're done processing
// the input file, so close it
fclose(inFile);
}
// Otherwise, process user input from standard input
else
{
do
{
printf("Please enter a string (or blank line to exit): ");
// Scanf takes user's input from stdin. Note the use
// of the regex [^\n], allowing the scanf statement
// to read input until the newline character is encountered
// (which happens when the user is done writing their string
// and presses the Enter key)
scanf("%[^\n]s", currentLine);
// Call the insert function on the line
// provided
insert(&root, currentLine);
} while (caseSenStrCmp(currentLine, "") != 0);
}
// At this point, we've read all the input, so
// perform in-order traversal and print all the
// strings as per assignment specification
inOrderPrint(root, oflag, outFile);
// We're done, so reclaim the tree
deallocateTree(root);
}
// ===== AUXILIARY METHODS ======
// Creates a new branch for the BST and returns a
// pointer to it. Will be called by the insert()
// function. Intended to keep the main() function
// as clutter-free as possible.
BST* createBranch(char *keyVal)
{
// Create the new branch to be inserted into
// the tree
BST* newBranch = (BST*)malloc(sizeof(BST));
// Allocate memory for newBranch's C-string
newBranch->key = (char*)malloc(STRING_SIZE * sizeof(char));
// Copy the user-provided string into newBranch's
// key field
strcpy(newBranch->key, keyVal);
// Set newBranch's counter value to 1. This
// will be incremented if/when other instances
// of the key are inserted into the tree
newBranch->counter = 1;
// Set newBranch's child branches to null
newBranch->left = NULL;
newBranch->right = NULL;
// Return the newly created branch
return newBranch;
}
// Adds items to the BST. Includes functionality
// to verify whether an item already exists in the tree
// Note that we pass the tree's root to the insert function
// as a POINTER TO A POINTER so that changes made to it
// affect the actual memory location that was passed in
// rather than just the local pointer
void insert(BST **root, char *key)
{
printf("We made it to the insert function!");
// Check if the current branch is empty
if (*root == NULL)
{
// If it is, create a new
// branch here and insert it
// This will also initialize the
// entire tree when the first element
// is inserted (i.e. when the tree is
// empty)
*root = createBranch(key);
}
// If the tree ISN'T empty, check whether
// the element we're trying to insert
// into the tree is already in it
// If it is, don't insert anything (the
// existsInTree function takes care of
// incrementing the counter associated
// with the provided string)
if (!existsInTree(*root, key, cflag))
{
// If it isn't, check if the case sensitivity
// flag is set; if it is, perform the
// checks using case-sensitive string
// comparison function
if (cflag) {
// Is the string provided (key) is
// greater than the string stored
// at the current branch?
if (caseSenStrCmp((*root)->key, key))
{
// If so, recursively call the
// insert() function on root's
// right child (that is, insert into
// the right side of the tree)
// Note that we pass the ADDRESS
// of root's right branch, since
// the insert function takes a
// pointer to a pointer to a BST
// as an argument
insert(&((*root)->right), key);
}
// If not, the key passed in is either less than
// or equal to the current branch's key,
// so recursively call the insert()
// function on root's LEFT child (that is,
// insert into the left side of the tree)
else
{
insert(&((*root)->left), key);
}
}
// If it isn't, perform the checks using
// the case-INsensitive string comparison
// function
else {
// The logic here is exactly the same
// as the comparisons above, except
// it uses the case-insensitive comparison
// function
if (caseInsenStrCmp((*root)->key, key))
{
insert(&((*root)->right), key);
}
else
{
insert(&((*root)->left), key);
}
}
}
}
// CASE SENSITIVE STRING COMPARISON function. Returns:
// -1 if str1 is lexicographically less than str2
// 0 if str1 is lexicographically equal to str2
// 1 if str2 is lexicographically greater than str1
I'm using getopt to parse options that the user enters. I've been doing a little bit of basic debugging using printf statements just to see how far I get into the code before it crashes, and I've more or less narrowed down the cause. It seems to be this part here:
do
{
printf("Please enter a string (or blank line to exit): ");
// Scanf takes user's input from stdin. Note the use
// of the regex [^\n], allowing the scanf statement
// to read input until the newline character is encountered
// (which happens when the user is done writing their string
// and presses the Enter key)
scanf("%[^\n]s", currentLine);
// Call the insert function on the line
// provided
insert(&root, currentLine);
} while (caseSenStrCmp(currentLine, "\n") != 0);
Or rather, calls to the insert function in general, since the printf statement I put at the beginning of the insert function ("We made it to the insert function!) gets printed over and over again until the program finally crashes with a segmentation fault, which probably means the problem is infinite recursion?
If so, I don't understand why it's happening. I initialized the root node to NULL at the beginning of main, so it should go directly into the insert functions *root == NULL case, at least on its first call.
Does it maybe have something to do with the way I pass root as a pointer to a pointer (BST **root in parameter list of insert function)? Am I improperly recursing, i.e. is this statement (and others similar to it)
insert(&((*root)->right), key);
incorrect somehow? This was my first guess, but I don't see how that would cause infinite recursion - if anything, it should fail without recursing at all if that was the case? Either way, it doesn't explain why infinite recursion happens when root is NULL (i.e. on the first call to insert, wherein I pass in &root - a pointer to the root pointer - to the insert function).
I'm really stuck on this one. At first I thought it might have something to do with the way I was copying strings to currentLine, since the line
if(*currentLine != '\0')
in the while (!feof(inFile)) loop also crashes the program with a segmentation fault, but even when I commented that whole part out just to test the rest of the code I ended up with the infinite recursion problem.
Any help at all is appreciated here, I've been trying to fix this for over 5 hours to no avail. I legitimately don't know what to do.
**EDIT: Since a lot of the comments involved questions regarding the way I declared other variables and such in the rest of the code, I've decided to include the entirety of my code, at least until the insert() function, which is where the problem (presumably) is. I only omitted things to try and keep the code to a minimum - I'm sure nobody likes to read through large blocks of code.
Barmar:
Regarding fopen() and fgets(): these were commented out so that inFile would remain NULL and the relevant conditional check would fail, since that part of the code also fails with a segmentation fault
createBranch() does initialize both the left and right children of the node it creates to NULL (as can be seen above)
currentLine is declared as an array with a static size.
#coderredoc:
My understanding of it is that it reads from standard input until it encounters the newline character (i.e. user hits the enter button), which isn't recorded as part of the string
I can already see where you were going with this! My loop conditional was set for the do/while loop was set to check for the newline character, so the loop would never have terminated. That's absolutely my fault; it was a carryover from a previous implementation of that block that I forgot to change.
I did change it after you pointed it out (see new code above), but unfortunately it didn't fix the problem (I'm guessing it's because of the infinite recursion happening inside the insert() function - once it gets called the first time, it never returns and just crashes with a segfault).
**
I managed to figure it out - turns out the problem was with the insert() function after all. I rewrote it (and the rest of the code that was relevant) to use a regular pointer rather than a pointer to a pointer:
BST* insert(BST* root, char *key)
{
// If branch is null, call createBranch
// to make one, then return it
if (root == NULL)
{
return createBranch(key);
}
// Otherwise, check whether the key
// already exists in the tree
if (!existsInTree(root, key, cflag))
{
// If it doesn't, check whether
// the case sensitivity flag is set
if (cflag)
{
// If it is, use the case-sensitive
// string comparison function to
// decide where to insert the key
if (caseSenStrCmp(root->key, key))
{
// If the key provided is greater
// than the string stored at the
// current branch, insert into
// right child
root->right = insert(root->right, key);
}
else
{
// Otherwise, insert into left child
root->left = insert(root->left, key);
}
}
// If it isn't, use the case-INsensitive string
// comparison function to decide where to insert
else
{
// Same logic as before. If the key
// provided is greater, insert into
// current branch's right child
if (caseInsenStrCmp(root->key, key))
{
root->right = insert(root->right, key);
}
// Otherwise, insert into the left child
else
{
root->left = insert(root ->left, key);
}
}
}
// Return the root pointer
return root;
}
Which immediately solved the infinite recursion/seg fault issue. It did reveal a few other minor semantic errors (most of which were probably made in frustration as I desperately tried to fix this problem without rewriting the insert function), but I've been taking care of those bit by bit.
I've now got a new problem (albeit probably a simpler one than this) which I'll make a separate thread for, since it's not related to segmentation faults.
We need to create a binary tree which contains content of textfiles. The pointer selection_a and selection_b pointing to another textfile in the directory.
The structure of the textfiles is following:
line: Title
line: OptionA
line: OptionB
line: Text.
The first file is given as parameter while starting the program. All files should be saved at the beginning of the program. Then the text of the first file shows, and the user can input A or B to continue. Based on the selection, the text of File Option A/B is shown and the user can decide again.
The last file of a tree contains no Options: lines 2 and 3 are "-\n".
The problem is, this code only reads all the option A files of the first tree. It doesn't read in any B-Options. In the end, the program shows a memory access error.
I think the problem is that the readingRows function has no abort condition.
current->selection_a = readingRows(input_selection_a);
current->selection_b = readingRows(input_selection_b);
I know the code may be kind of chaotic, but we are beginners in programming. Hope anybody can help us to write an abort-condition.
The function should be aborted if the content of option A (line 3) is "-\n".
Here is the whole function:
struct story_file* readingRows(FILE *current_file)
{
char *buffer = fileSize(current_file);
char *delimiter = "\n";
char *lines = strtok(buffer, delimiter);
int line_counter = 0;
struct story_file *current = malloc(sizeof(struct story_file));
while(lines != NULL)
{
if(line_counter == 0)
{
current->title = lines;
}
else if(line_counter == 1)
{
char *filename_chapter_a = lines;
FILE *input_selection_a = fopen(filename_chapter_a, "r");
if(input_selection_a)
{
current->selection_a = readingRows(input_selection_a);
}
fclose(input_selection_a);
}
else if(line_counter == 2)
{
char *filename_chapter_b = lines;
FILE *input_selection_b = fopen(filename_chapter_b, "r");
if(input_selection_b)
{
current->selection_b = readingRows(input_selection_b);
}
fclose(input_selection_b);
}
else if (line_counter >= 3)
{
current->text = lines;
}
lines = strtok(NULL, delimiter);
line_counter++;
}
return current;
}
There are two items that define a terminating recursive function:
One or more base cases
Recursive calls that move toward a base case
Your code has one base case: while (lines!=NULL) {} return current;, it breaks the while loop when lines is NULL and returns current. In other words, within any particular call to your function, it only terminates when it reaches the end of a file.
Your code moves toward that base case as long as your files do not refer to each other in a loop. We know this because you always read a line, take an action according to your if-else block, and the read the next line. So you always move toward the end of each file you read.
But as you note, the issue is that you don't have a case to handle "no Options", being when lines 2 or 3 are "-\n". So right now, even though you move through files, you are always opening files in line 2. Unless a file is malformed and does not contain a line 2, your recursive call tree never ends. So you just need to add another base case that looks at whether the beginning of lines matches "-\n", and if it does, return before the recursive call. This will end that branch of your recursive tree.
Inside of your while loop, you will need code along the lines of:
if `line_counter` is `2` or `3`
if `lines` starts with your terminating sequence "-\n"
return current
else
`fopen` and make the recursive call
In the parent function that made the recursive call, it will move to the next line and continue as expected.
P.S. Make sure you use free for each malloc you do.
I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).
I have the call itself in a single function:
//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
//Local variables
URL_FILE *handle;
char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";
//Build full URL
/*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
strcat (url, entry->title);
strcat (url, ".fasta");
//Open URL
/*printf ("u:%s\n", url); /*This line was used for debugging.*/
handle = url_fopen (url, "r");
//If there is data there
if (handle != NULL) {
//Skip the first line as it's got useless info
do {
url_fread(buffer, 1, 1, handle);
} while (buffer[0] != '\n');
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
//Print it
printFastaEntry (entry->title, sequence, matchType, outFile);
}
url_fclose (handle);
return;
}
With proteinEntry being defined as:
//Entry for fasta formatable data
typedef struct proteinEntry {
char title[7];
struct proteinEntry *next;
} proteinEntry;
And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.
As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.
I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.
My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).
Can anyone point me in the right direction to fix my mistake please?
EDIT: As #BLUEPIXY pointed out I had a potential url_fclose (NULL). As #deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!
If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
Which is a 60 characters string.
When you try to read this sequence with:
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).
So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.
I've written an interpreter for a C-like language, using Flex and Bison for the scanner/parser. It's working fine when executing full program files.
Now I'm trying implement a REPL in the interpreter for interactive use. I want it to work like the command line interpreters in Ruby or ML:
Show a prompt
Accept one or more statements on the line
If the expression is incomplete
display a continuation prompt
allow the user to continue entering lines
When the line ends with a complete expression
echo the result of evaluating the last expression
show the main prompt
My grammar starts with a top_level production, which represents a single statement in the language. The lexer is configured for interactive mode on stdin. I am using the same scanner and grammar in both full-file and REPL modes, because there's no semantic difference in the two interfaces.
My main evaluation loop is structured like this.
while (!interpreter.done) {
if (interpreter.repl)
printf(prompt);
int status = yyparse(interpreter);
if (status) {
if (interpreter.error)
report_error(interpreter);
}
else {
if (interpreter.repl)
puts(interpreter.result);
}
}
This works fine except for the prompt and echo logic. If the user enters multiple statements on a line, this loop prints out superfluous prompts and expressions. And if the expression continues on multiple lines, this code doesn't print out continuation prompts. These problems occur because the granularity of the prompt/echo logic is a top_level statement in the grammar, but the line-reading logic is deep in the lexer.
What's the best way to restructure the evaluation loop to handle the REPL prompting and echoing? That is:
how can I display one prompt per line
how can I display the continuation prompt at the right time
how can I tell when a complete expression is the last one on a line
(I'd rather not change the scanner language to pass newline tokens, since that will severely alter the grammar. Modifying YY_INPUT and adding a few actions to the Bison grammar would be fine. Also, I'm using the stock Flex 2.5.35 and Bison 2.3 that ship with Xcode.)
After looking at how languages like Python and SML/NJ handle their REPLs, I got a nice one working in my interpreter. Instead of having the prompt/echo logic in the outermost parser driver loop, I put it in the innermost lexer input routine. Actions in the parser and lexer set flags that control the prompting by input routine.
I'm using a reentrant scanner, so yyextra contains the state passed between the layers of the interpreter. It looks roughly like this:
typedef struct Interpreter {
char* ps1; // prompt to start statement
char* ps2; // prompt to continue statement
char* echo; // result of last statement to display
BOOL eof; // set by the EOF action in the parser
char* error; // set by the error action in the parser
BOOL completeLine // managed by yyread
BOOL atStart; // true before scanner sees printable chars on line
// ... and various other fields needed by the interpreter
} Interpreter;
The lexer input routine:
size_t yyread(FILE* file, char* buf, size_t max, Interpreter* interpreter)
{
// Interactive input is signaled by yyin==NULL.
if (file == NULL) {
if (interpreter->completeLine) {
if (interpreter->atStart && interpreter->echo != NULL) {
fputs(interpreter->echo, stdout);
fputs("\n", stdout);
free(interpreter->echo);
interpreter->echo = NULL;
}
fputs(interpreter->atStart ? interpreter->ps1 : interpreter->ps2, stdout);
fflush(stdout);
}
char ibuf[max+1]; // fgets needs an extra byte for \0
size_t len = 0;
if (fgets(ibuf, max+1, stdin)) {
len = strlen(ibuf);
memcpy(buf, ibuf, len);
// Show the prompt next time if we've read a full line.
interpreter->completeLine = (ibuf[len-1] == '\n');
}
else if (ferror(stdin)) {
// TODO: propagate error value
}
return len;
}
else { // not interactive
size_t len = fread(buf, 1, max, file);
if (len == 0 && ferror(file)) {
// TODO: propagate error value
}
return len;
}
}
The top level interpreter loop becomes:
while (!interpreter->eof) {
interpreter->atStart = YES;
int status = yyparse(interpreter);
if (status) {
if (interpreter->error)
report_error(interpreter);
}
else {
exec_statement(interpreter);
if (interactive)
interpreter->echo = result_string(interpreter);
}
}
The Flex file gets these new definitions:
%option extra-type="Interpreter*"
#define YY_INPUT(buf, result, max_size) result = yyread(yyin, buf, max_size, yyextra)
#define YY_USER_ACTION if (!isspace(*yytext)) { yyextra->atStart = NO; }
The YY_USER_ACTION handles the tricky interplay between tokens in the language grammar and lines of input. My language is like C and ML in that a special character (';') is required to end a statement. In the input stream, that character can either be followed by a newline character to signal end-of-line, or it can be followed by characters that are part of a new statement. The input routine needs to show the main prompt if the only characters scanned since the last end-of-statement are newlines or other whitespace; otherwise it should show the continuation prompt.
I too am working on such an interpreter, I haven't gotten to the point of making a REPL yet, so my discussion might be somewhat vague.
Is it acceptable if given a sequence of statements on a single line, only the result of the last expression is printed? Because you can re-factor your top level grammar rule like so:
top_level = top_level statement | statement ;
The output of your top_level then could be a linked list of statements, and interpreter.result would be the evaluation of the tail of this list.