Related
Our organization has a required coding rule (without any explanation) that:
if … else if constructs should be terminated with an else clause
Example 1:
if ( x < 0 )
{
x = 0;
} /* else not needed */
Example 2:
if ( x < 0 )
{
x = 0;
}
else if ( y < 0 )
{
x = 3;
}
else /* this else clause is required, even if the */
{ /* programmer expects this will never be reached */
/* no change in value of x */
}
What edge case is this designed to handle?
What also concerns me about the reason is that Example 1 does not need an else but Example 2 does. If the reason is re-usability and extensibility, I think else should be used in both cases.
As mentioned in another answer, this is from the MISRA-C coding guidelines. The purpose is defensive programming, a concept which is often used in mission-critical programming.
That is, every if - else if must end with an else, and every switch must end with a default.
There are two reasons for this:
Self-documenting code. If you write an else but leave it empty it means: "I have definitely considered the scenario when neither if nor else if are true".
Not writing an else there means: "either I considered the scenario where neither if nor else if are true, or I completely forgot to consider it and there's potentially a fat bug right here in my code".
Stop runaway code. In mission-critical software, you need to write robust programs that account even for the highly unlikely. So you could see code like
if (mybool == TRUE)
{
}
else if (mybool == FALSE)
{
}
else
{
// handle error
}
This code will be completely alien to PC programmers and computer scientists, but it makes perfect sense in mission-critical software, because it catches the case where the "mybool" has gone corrupt, for whatever reason.
Historically, you would fear corruption of the RAM memory because of EMI/noise. This is not much of an issue today. Far more likely, memory corruption occurs because of bugs elsewhere in the code: pointers to wrong locations, array-out-of-bounds bugs, stack overflow, runaway code etc.
So most of the time, code like this comes back to slap yourself in the face when you have written bugs during the implementation stage. Meaning it could also be used as a debug technique: the program you are writing tells you when you have written bugs.
EDIT
Regarding why else is not needed after every single if:
An if-else or if-else if-else completely covers all possible values that a variable can have. But a plain if statement is not necessarily there to cover all possible values, it has a much broader usage. Most often you just wish to check a certain condition and if it is not met, then do nothing. Then it is simply not meaningful to write defensive programming to cover the else case.
Plus it would clutter up the code completely if you wrote an empty else after each and every if.
MISRA-C:2012 15.7 gives no rationale why else is not needed, it just states:
Note: a final else statement is not required for a simple if
statement.
Your company followed MISRA coding guidance. There are a few versions of these guidelines that contain this rule, but from MISRA-C:2004†:
Rule 14.10 (required): All if … else if constructs shall be terminated
with an else clause.
This rule applies whenever an if statement is followed by one or more
else if statements; the final else if shall be followed by an else
statement. In the case of a simple if statement then the else
statement need not be included. The requirement for a final else
statement is defensive programming. The else statement shall either
take appropriate action or contain a suitable comment as to why no
action is taken. This is consistent with the requirement to have a
final default clause in a switch statement. For example this code
is a simple if statement:
if ( x < 0 )
{
log_error(3);
x = 0;
} /* else not needed */
whereas the following code demonstrates an if, else if construct
if ( x < 0 )
{
log_error(3);
x = 0;
}
else if ( y < 0 )
{
x = 3;
}
else /* this else clause is required, even if the */
{ /* programmer expects this will never be reached */
/* no change in value of x */
}
In MISRA-C:2012, which supersedes the 2004 version and is the current recommendation for new projects, the same rule exists but is numbered 15.7.
Example 1:
in a single if statement programmer may need to check n number of conditions and performs single operation.
if(condition_1 || condition_2 || ... condition_n)
{
//operation_1
}
In a regular usage performing a operation is not needed all the time when if is used.
Example 2:
Here programmer checks n number of conditions and performing multiple operations. In regular usage if..else if is like switch you may need to perform a operation like default. So usage else is needed as per misra standard
if(condition_1 || condition_2 || ... condition_n)
{
//operation_1
}
else if(condition_1 || condition_2 || ... condition_n)
{
//operation_2
}
....
else
{
//default cause
}
† Current and past versions of these publications are available for purchase via the MISRA webstore (via).
This is the equivalent of requiring a default case in every switch.
This extra else will Decrease code coverage of your program.
In my experience with porting linux kernel , or android code to different platform many time we do something wrong and in logcat we see some error like
if ( x < 0 )
{
x = 0;
}
else if ( y < 0 )
{
x = 3;
}
else /* this else clause is required, even if the */
{ /* programmer expects this will never be reached */
/* no change in value of x */
printk(" \n [function or module name]: this should never happen \n");
/* It is always good to mention function/module name with the
logs. If you end up with "this should never happen" message
and the same message is used in many places in the software
it will be hard to track/debug.
*/
}
Only a brief explanation, since I did this all about 5 years ago.
There is (with most languages) no syntactic requirement to include "null" else statement (and unnecessary {..}), and in "simple little programs" there is no need. But real programmers don't write "simple little programs", and, just as importantly, they don't write programs that will be used once and then discarded.
When one write an if/else:
if(something)
doSomething;
else
doSomethingElse;
it all seems simple and one hardly sees even the point of adding {..}.
But some day, a few months from now, some other programmer (you would never make such a mistake!) will need to "enhance" the program and will add a statement.
if(something)
doSomething;
else
doSomethingIForgot;
doSomethingElse;
Suddenly doSomethingElse kinda forgets that it's supposed to be in the else leg.
So you're a good little programmer and you always use {..}. But you write:
if(something) {
if(anotherThing) {
doSomething;
}
}
All's well and good until that new kid makes a midnight modification:
if(something) {
if(!notMyThing) {
if(anotherThing) {
doSomething;
}
else {
dontDoAnything; // Because it's not my thing.
}}
}
Yes, it's improperly formatted, but so is half the code in the project, and the "auto formatter" gets bollixed up by all the #ifdef statements. And, of course, the real code is far more complicated than this toy example.
Unfortunately (or not), I've been out of this sort of thing for a few years now, so I don't have a fresh "real" example in mind -- the above is (obviously) contrived and a bit hokey.
This, is done to make the code more readable, for later references and to make it clear, to a later reviewer, that the remaining cases handled by the last else, are do nothing cases, so that they are not overlooked somehow at first sight.
This is a good programming practice, which makes code reusable and extend-able.
I would like to add to – and partly contradict – the previous answers. While it is certainly common to use if-else if in a switch-like manner that should cover the full range of thinkable values for an expression, it is by no means guaranteed that any range of possible conditions is fully covered. The same can be said about the switch construct itself, hence the requirement to use a default clause, which catches all remaining values and can, if not otherwise required anyway, be used as an assertion safeguard.
The question itself features a good counter-example: The second condition does not relate to x at all (which is the reason why I often prefer the more flexible if-based variant over the switch-based variant). From the example it is obvious that if condition A is met, x should be set to a certain value. Should A not be met, then condition B is tested. If it is met, then x should receive another value. If neither A nor B are met, then x should remain unchanged.
Here we can see that an empty else branch should be used to comment on the programmer's intention for the reader.
On the other hand, I cannot see why there must be an else clause especially for the latest and innermost if statement. In C, there is no such thing as an 'else if'. There is only if and else. Instead, the construct should formally be indented this way (and I should have put the opening curly braces on their own lines, but I don't like that):
if (A) {
// do something
}
else {
if (B) {
// do something else (no pun intended)
}
else {
// don't do anything here
}
}
Should any standard happen to require curly braces around every branch, then it would contradict itself if it mentioned "if ... else if constructs" at the same time.
Anyone can imagine the ugliness of deeply nested if else trees, see here on a side note. Now imagine that this construct can be arbitrarily extended anywhere. Then asking for an else clause in the end, but not anywhere else, becomes absurd.
if (A) {
if (B) {
// do something
}
// you could to something here
}
else {
// or here
if (B) { // or C?
// do something else (no pun intended)
}
else {
// don't do anything here, if you don't want to
}
// what if I wanted to do something here? I need brackets for that.
}
In the end, it comes down for them to defining precisely what is meant with an "if ... else if construct"
The basic reason is probably code coverage and the implicit else: how will the code behave if the condition is not true? For genuine testing, you need some way to see that you have tested with the condition false. If every test case you have goes through the if clause, your code could have problems in the real world because of a condition that you did not test.
However, some conditions may properly be like Example 1, like on a tax return: "If the result is less than 0, enter 0." You still need to have a test where the condition is false.
Logically any test implies two branches. What do you do if it is true, and what do you do if it is false.
For those cases where either branch has no functionality, it is reasonable to add a comment about why it doesn't need to have functionality.
This may be of benefit for the next maintenance programmer to come along. They should not have to search too far to decide if the code is correct. You can kind of Prehunt the Elephant.
Personally, it helps me as it forces me to look at the else case, and evaluate it. It may be an impossible condition, in which case i may throw an exception as the contract is violated. It may be benign, in which case a comment may be enough.
Your mileage may vary.
Most the time when you just have a single if statement, it's probably one of reasons such as:
Function guard checks
Initialization option
Optional processing branch
Example
void print (char * text)
{
if (text == null) return; // guard check
printf(text);
}
But when you do if .. else if, it's probably one of reasons such as:
Dynamic switch-case
Processing fork
Handling a processing parameter
And in case your if .. else if covers all possibilities, in that case your last if (...) is not needed, you can just remove it, because at that point the only possible values are the ones covered by that condition.
Example
int absolute_value (int n)
{
if (n == 0)
{
return 0;
}
else if (n > 0)
{
return n;
}
else /* if (n < 0) */ // redundant check
{
return (n * (-1));
}
}
And in most of these reasons, it's possible something doesn't fit into any of the categories in your if .. else if, thus the need to handle them in a final else clause, handling can be done through business-level procedure, user notification, internal error mechanism, ..etc.
Example
#DEFINE SQRT_TWO 1.41421356237309504880
#DEFINE SQRT_THREE 1.73205080756887729352
#DEFINE SQRT_FIVE 2.23606797749978969641
double square_root (int n)
{
if (n > 5) return sqrt((double)n);
else if (n == 5) return SQRT_FIVE;
else if (n == 4) return 2.0;
else if (n == 3) return SQRT_THREE;
else if (n == 2) return SQRT_TWO;
else if (n == 1) return 1.0;
else if (n == 0) return 0.0;
else return sqrt(-1); // error handling
}
This final else clause is quite similar to few other things in languages such as Java and C++, such as:
default case in a switch statement
catch(...) that comes after all specific catch blocks
finally in a try-catch clause
Our software was not mission critical, yet we also decided to use this rule because of defensive programming.
We added a throw exception to the theoretically unreachable code (switch + if-else). And it saved us many times as the software failed fast e.g. when a new type has been added and we forgot to change one-or-two if-else or switch. As a bonus it made super easy to find the issue.
Well, my example involves undefined behavior, but sometimes some people try to be fancy and fails hard, take a look:
int a = 0;
bool b = true;
uint8_t* bPtr = (uint8_t*)&b;
*bPtr = 0xCC;
if(b == true)
{
a += 3;
}
else if(b == false)
{
a += 5;
}
else
{
exit(3);
}
You probably would never expect to have bool which is not true nor false, however it may happen. Personally I believe this is problem caused by person who decides to do something fancy, but additional else statement can prevent any further issues.
I'm currently working with PHP. Creating a registration form and a login form. I am just purely using if and else. No else if or anything that is unnecessary.
If user clicks submits button -> it goes to the next if statement... if username is less than than 'X' amount of characters then alert. If successful then check password length and so on.
No need for extra code such as an else if that could dismiss reliability for server load time to check all the extra code.
As this question on boolean if/else if was closed as a duplicate. As well, there are many bad answers here as it relates to safety-critical.
For a boolean, there are only two cases. In the boolean instance, following the MISRA recommendation blindly maybe bad. The code,
if ( x == FALSE ) {
// Normal action
} else if (x == TRUE ) {
// Fail safe
}
Should just be refactored to,
if ( x == FALSE ) {
// Normal action
} else {
// Fail safe
}
Adding another else increases cyclometric complexity and makes it far harder to test all branches. Some code maybe 'safety related'; Ie, not a direct control function that can cause an unsafe event. In this code, it is often better to have full testability without instrumentation.
For truly safety functional code, it might make sense to separate the cases to detect a fault in this code and have it reported. Although I think logging 'x' on the failure would handle both. For the other cases, it will make the system harder to test and could result in lower availability depending on what the second 'error handling' action is (see other answers where exit() is called).
For non-booleans, there may be ranges that are nonsensical. Ie, they maybe some analog variable going to a DAC. In these cases, the if(x > 2) a; else if(x < -2) b; else c; makes sense for cases where deadband should not have been sent, etc. However, these type of cases do not exist for a boolean.
I am trying to resolve warning issues which is shown as below :
warning: suggest braces around empty body in an 'if' statement
Relevant code:
cdc(.....)
{
//some statements
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t); //Showing warning in this line
if(something)
{
if(..)
{
}
else
{
}
}
else
{
}
}
If I remove ; and adding the braces as below
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t)
{
}
the warning is gone.
What does exactly it means? Is it behaving like an if statement?
Sorry, its confidential code, so I cant share entirely.
If this is your code
if (/* condition */);
/* other code */
Then the other code will ALWAYS be executed.
You probably want the other code to only be executed if the condition is true.
In order to achieve that, you mainly have to delete the ;.
It is widely considered to be best practice to be somewhat generous with the {}, i.e.
if (/* condition */)
{
/* other code */
}
The fact that the warning does not occur after deleting the ; in line
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t); and replacing it with {}
can be explained if it is actually a macro which essentially expands (together with the ; which is NOT part of the macro) to the if();, which earlier versions of your question were mentioning.
The replacement with {} then does exactly what the compiler wanted.
The ENTER_FUNC() is probably meant to be used like
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t) /* delete this ; */
{ /* new {, followed by rest of your function code */
if(something)
{
if(..)
{
}
else
{
}
}
else
{
}
} /* new */
Please excuse that this answer more or less assumes that you made a mistake in your code. Compare the contribution by Scheff, which assumes (also plausibly) that actually you were acting to a more complex design and fully intentionally.
The statement
if (cond) ; else do_something();
or even
if (cond) ; do_something();
might be intended. May be, the ; after if (cond) is a placeholder for something which shall be added later.
Inserting comments
if (cond) /** #todo */ ; else do_something();
or
if (cond) /** #todo */ ; /* and then always */ do_something();
would make it clear to the human reader but not for the compiler which ignores comments completely.
However, the compiler authors suspected high chance that the semicolon was unintendedly set (and can easily be overlooked). Hence, they spent a warning about this and gave a hint how to make the intention clear if there is one:
Use { } instead ; for intendedly empty then-body to come around this warning.
Sample:
#include <stdio.h>
int main()
{
int cond = 1;
if (cond) /** #todo */ ; else printf("cond not met.\n");
if (cond) /** #todo */ ; printf("cond checked.\n");
return 0;
}
Output:
cond checked.
Life demo on ideone
The compiler used on ideone is stated as gcc 6.3.
I must admit that I didn't get the diagnostics of OP.
After the question was edited, the answer does not seem to match the question anymore. Hence, a little update:
The OP states that the
warning: suggest braces around empty body in an 'if' statement
appears for this line of code:
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t); //Showing warning in this line
It seems that the OP was not aware that ENTER_FUNC is (very likely) a macro with an if statement in its replacement text (something like #define ENTER_FUNC(A,B) if (...)). (This is the most imaginable scenario to get this warning for this code.)
Unfortunately, the OP is not willing to show how ENTER_FUNC is defined, nor to prepare an MCVE with the same behavior.
However, the technique to hide an if in a macro is even more questionable – I wouldn't recommend to do so. Imagine the following situation:
cdc(.....)
{
//some statements
ENTER_FUNC(CDC_TRKEY_FC,cdcType_t) // This time, the author forgot the ; or {}
if(something)
{
if(..)
{
}
else
{
}
}
else
{
}
}
The if(something) statement becomes now the body of the hidden if of the ENTER_FUNC() macro which is probably not intended but a bug. The application may now behave wrong in certain situations. By simply looking at the source code, this is probably hard to catch. Only, by single-step debugging and a bit luck, the error can be found.
(Another option would be to expand all macros and check the C code after replacement. C compilers provide usually a pre-process-only option which makes the result of pre-processing visible to human eyes. E.g. gcc -E)
So, the author of ENTER_FUNC built a macro which
causes a compiler warning if macro is used properly
where the warning goes away if macros is used wrong.
IMHO, this is a not-so-lucky design.
I am writing a compiler in C, and I use bison for the grammar and flex for the tokens. To improve the quality of error messages, some common errors need to appear in the grammar. This has the side effect, however, of bison thinking that an invalid input is actually valid.
For example, consider this grammar:
program
: command ';' program
| command ';'
| command {yyerror("Missing ;.");} // wrong input
;
command
: INC
| DEC
;
where INC and DEC are tokens and program is the initial symbol. In this case, INC; is a valid program, but INC is not, and an error message is generated. The function yyparse(), however, returns 0 as if the program were correct.
Looking at the bison manual, I found the macro YYERROR, which should behave as if the parser itself found an error. But even if I add YYERROR after the call to yyerror(), the function yyparse() still returns 0. I could use YYABORT instead, but that would stop on the first error, which is terrible and not what I want.
Is there anyway to make yyparse() return 1 without stopping on the first error?
Since you intend to recover from syntax errors, you're not going to be able to use the return code from yyparse to signal that one or more errors occurred. Instead, you'll have to track that information yourself.
The traditional way to do that would be to use a global error count (just showing the crucial pieces):
%{
int parse_error_count = 0;
%}
%%
program: statement { yyerror("Missing semicolon");
++parse_error_count; }
%%
int parse_interface() {
parse_error_count = 0;
int status = yyparse();
if (status) return status; /* Might have run out of memory */
if (parse_error_count) return 3; /* yyparse returns 0, 1 or 2 */
return 0;
}
A more modern solution is to define an additional "out" parameter to yyparse:
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
int main() {
int error_count = 0;
int status = yyparse(&error_count);
if (status || error_count) { /* handle error */ }
Finally, in case you really need to export the symbol yyparse from your bison-generated code, you can do the following ugly hack:
%code top {
#define yyparse internal_yyparse
}
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
#undef yyparse
int yyparse() {
int error_count = 0;
int status = internal_yyparse(&error_count);
// Whatever you want to do as a summary
return status ? status : error_count ? 1 : 0;
}
yyerror() just prints an error message. It doesn't alter what yyparse() returns.
What you're attempting is not a good idea. You'll enormously expand the grammar and you run a major risk of making it ambiguous. All you need to do is remove the production that calls yyerror(). That input will produce a syntax error anyway, and that will cause yyparse() not to return 0. You're keeping a dog and barking yourself. What you should be checking for is semantic errors that the parser can't see.
If you really want to improve the error messages, there's enough information in the parse tables and state information to tell you what the expected next token was. However in most cases it's such a large set it's pointless to print it. But programmers are used to sorting out 'syntax error'. Don't sweat it. Writing compilers is hard enough already.
NB You should make your grammar left-recursive to avoid excessive stack usage: for example, program : program ';' command.
I'm working with a large SDK codebase glommed together from various sources of varying quality / competence / sanity from Linus Torvalds to unidentified Elbonian code slaves.
There are an assortment of styles of code, some clearly better than others, and it's proving an interesting opportunity to expand my knowledge / despair for the future of humanity in alternate measures.
I've just come across a pile of functions which repeatedly use a slightly odd (to me) style, namely:
void do_thing(foo)
{
do {
if(this_works(foo) != success)
break;
return(yeah_cool);
} while (0);
return(failure_shame_death);
}
There's nothing complicated being done in this code (I haven't cut 10,000 lines of wizardry out for this post), they could just as easily do:
if(this_works(foo) == success)
return(yeah_cool);
else
return(failure_shame_death);
Which would seem somehow nicer / neater / more intuitive / easier to read.
So I'm now wondering if there is some (good) reason for doing it the other way, or is it just the way they always do it in the Elbonian Code Mines?
Edit: As per the "possible duplicate" links, this code is not pre-processed in any sort of macro, it is just in the normal code. I can believe it might be due to a coding style rule about error checking, as per this answer.
Another guess: maybe you didn't quote the original code correctly? I have seen the same pattern used by people who want to avoid goto: they use a do-while(0) loop which at the end returns a success value. They can also break out of the loop for the error handling:
int doXandY() {
do {
if (!x()) {
break;
}
if (!y()) {
break;
}
return 0;
} while( 0 );
/* Error handling code goes here. */
globalErrorFlag = 12345;
return -1;
}
In your example there's not much point to it because the loop is very short (i.e. just one error case) and the error handling code is just a return, but I suspect that in the real code it can be more complex.
Some people use the do{} while(0); construct with break; inside the loop to be compliant in some way with MISRA rule 14.7. This rule says that there can be only single enter and exit point in the function. This rule is also required by safety norm ISO26262. Please find an example function:
int32_t MODULE_some_function(bool first_condition,bool second_condition)
{
int32_t ret = -1 ;
do
{
if(first_condition)
{
ret = 0 ;
break ;
}
/* some code here */
if(second_condition)
{
ret = 0 ;
break ;
}
/* some code here */
} while(0) ;
return ret ;
}
Please note however that such a construct as I show above violates different MISRA rule which is rule 14.6. Writing such a code you are going to be compliant with one MISRA rule, and as far as I know people use such a construct as workaround against using multiple returns from function.
In my opinion practical usage of the do{}while(0); construct truely exist in the way you should construct some types of macros.Please check below question, it was very helpful for me :
Why use apparently meaningless do-while and if-else statements in macros?
It's worth notice also that in some cases do{}while(0); construct is going to be completely optimized away if you compile your code with proper optimization option.
Hm, the code might be preprocessed somehow. The do { } while(0) is a trick used in preprocessor macros; you can define them like this:
#define some_macro(a) do { whatever(); } while(0)
The advantage being that you can use them anywhere, because it is allowed to put a semicolon after the while(0), like in your code above.
The reason for this is that if you write
#define some_macro(a) { whatever(); }
if (some_condition)
some_macro(123);
else
printf("this can cause problems\n");
Since there is an extra semicolon before the else statement, this code is invalid. The do { ... } while(0) will work anywhere.
do {...} while(0) arranged with "break" is some kind of "RAII for Plain C".
Here, "break" is treated as abnormal scope exit (kind of "Plain C exceptions"), so you can be sure that there is only one place to deallocate a resource: after a "while(0)". It seems slightly unusual, but actually it's very common idiom in the world of plain C.
I would guess that this code was originally written with gotos for error handling:
void do_thing(foo)
{
if(this_works(foo) != success)
goto error;
return(yeah_cool);
error:
return(failure_shame_death);
}
But at some point an edict came down from on high "thou shalt not use goto", so someone did a semi-automatic translation from goto style to loop-break style (perhaps with simple script). Probably when the code was merged/moved from one project to another.
I've written an interpreter for a C-like language, using Flex and Bison for the scanner/parser. It's working fine when executing full program files.
Now I'm trying implement a REPL in the interpreter for interactive use. I want it to work like the command line interpreters in Ruby or ML:
Show a prompt
Accept one or more statements on the line
If the expression is incomplete
display a continuation prompt
allow the user to continue entering lines
When the line ends with a complete expression
echo the result of evaluating the last expression
show the main prompt
My grammar starts with a top_level production, which represents a single statement in the language. The lexer is configured for interactive mode on stdin. I am using the same scanner and grammar in both full-file and REPL modes, because there's no semantic difference in the two interfaces.
My main evaluation loop is structured like this.
while (!interpreter.done) {
if (interpreter.repl)
printf(prompt);
int status = yyparse(interpreter);
if (status) {
if (interpreter.error)
report_error(interpreter);
}
else {
if (interpreter.repl)
puts(interpreter.result);
}
}
This works fine except for the prompt and echo logic. If the user enters multiple statements on a line, this loop prints out superfluous prompts and expressions. And if the expression continues on multiple lines, this code doesn't print out continuation prompts. These problems occur because the granularity of the prompt/echo logic is a top_level statement in the grammar, but the line-reading logic is deep in the lexer.
What's the best way to restructure the evaluation loop to handle the REPL prompting and echoing? That is:
how can I display one prompt per line
how can I display the continuation prompt at the right time
how can I tell when a complete expression is the last one on a line
(I'd rather not change the scanner language to pass newline tokens, since that will severely alter the grammar. Modifying YY_INPUT and adding a few actions to the Bison grammar would be fine. Also, I'm using the stock Flex 2.5.35 and Bison 2.3 that ship with Xcode.)
After looking at how languages like Python and SML/NJ handle their REPLs, I got a nice one working in my interpreter. Instead of having the prompt/echo logic in the outermost parser driver loop, I put it in the innermost lexer input routine. Actions in the parser and lexer set flags that control the prompting by input routine.
I'm using a reentrant scanner, so yyextra contains the state passed between the layers of the interpreter. It looks roughly like this:
typedef struct Interpreter {
char* ps1; // prompt to start statement
char* ps2; // prompt to continue statement
char* echo; // result of last statement to display
BOOL eof; // set by the EOF action in the parser
char* error; // set by the error action in the parser
BOOL completeLine // managed by yyread
BOOL atStart; // true before scanner sees printable chars on line
// ... and various other fields needed by the interpreter
} Interpreter;
The lexer input routine:
size_t yyread(FILE* file, char* buf, size_t max, Interpreter* interpreter)
{
// Interactive input is signaled by yyin==NULL.
if (file == NULL) {
if (interpreter->completeLine) {
if (interpreter->atStart && interpreter->echo != NULL) {
fputs(interpreter->echo, stdout);
fputs("\n", stdout);
free(interpreter->echo);
interpreter->echo = NULL;
}
fputs(interpreter->atStart ? interpreter->ps1 : interpreter->ps2, stdout);
fflush(stdout);
}
char ibuf[max+1]; // fgets needs an extra byte for \0
size_t len = 0;
if (fgets(ibuf, max+1, stdin)) {
len = strlen(ibuf);
memcpy(buf, ibuf, len);
// Show the prompt next time if we've read a full line.
interpreter->completeLine = (ibuf[len-1] == '\n');
}
else if (ferror(stdin)) {
// TODO: propagate error value
}
return len;
}
else { // not interactive
size_t len = fread(buf, 1, max, file);
if (len == 0 && ferror(file)) {
// TODO: propagate error value
}
return len;
}
}
The top level interpreter loop becomes:
while (!interpreter->eof) {
interpreter->atStart = YES;
int status = yyparse(interpreter);
if (status) {
if (interpreter->error)
report_error(interpreter);
}
else {
exec_statement(interpreter);
if (interactive)
interpreter->echo = result_string(interpreter);
}
}
The Flex file gets these new definitions:
%option extra-type="Interpreter*"
#define YY_INPUT(buf, result, max_size) result = yyread(yyin, buf, max_size, yyextra)
#define YY_USER_ACTION if (!isspace(*yytext)) { yyextra->atStart = NO; }
The YY_USER_ACTION handles the tricky interplay between tokens in the language grammar and lines of input. My language is like C and ML in that a special character (';') is required to end a statement. In the input stream, that character can either be followed by a newline character to signal end-of-line, or it can be followed by characters that are part of a new statement. The input routine needs to show the main prompt if the only characters scanned since the last end-of-statement are newlines or other whitespace; otherwise it should show the continuation prompt.
I too am working on such an interpreter, I haven't gotten to the point of making a REPL yet, so my discussion might be somewhat vague.
Is it acceptable if given a sequence of statements on a single line, only the result of the last expression is printed? Because you can re-factor your top level grammar rule like so:
top_level = top_level statement | statement ;
The output of your top_level then could be a linked list of statements, and interpreter.result would be the evaluation of the tail of this list.