Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Most of us have probably encountered the bug of when you add another statement inside an 'if' and notice that it prints regardless of the condition of the branch, only to find out sooner or later in frustration that the braces around the 'if' condition was missing.
Example:
if (condition) {
statement;
}
Compared to:
if (condition)
statement;
Please note that I wouldn't want a discussion on what coding standard is better, as we all have our opinion about that and it's very much dependent on context. Instead, I would be interested in the advantages from a programming language point of view and from these advantages, why it was decided not to be strictly enforced. Does it make for a simpler grammar? Compiler implementation?
What is the advantage of leaving it optional, rather than strictly enforcing it? What does the standard say about it?
The C Standard says that a if (expression) must be followed by a statement. It does not require the statement to be a compound statement.
In the "Spirit of C" (see C Rationale document), one of the phrase is:
Trust the programmer.
MISRA-C on the other hand has a required rule for if and loops that the statement has to be a compound statement.
Some coding styles allow a single statement for if only if the statement is put in the same line as the if. For example:
if (condition) statement;
Note that C and C++ are not only the languages to have the {} be optional in if. For example in Java it is also optional, but in Perl it is required.
Because:
that would be excessive meddling — the programmer chooses coding style, not the language;
it would further complicate the grammar; easier and more robust to just say that an if (condition) is followed by a statement and let the recursion handle everything.
Personally I find the distrust of this construct to be largely without merit for any vaguely competent programmer; I've not once encountered this "bug" where I've forgotten how to write C++ just because I chose to omit a brace pair. If you come across it, I suggest you treat it as a cue to pay more attention to what you're doing, rather than masking it with "tricks" and "style guides".
Sometimes, not having to write the braces makes the code more readable imho:
if (x < 0) throw std::invalid_argument("meh");
if (a != b) return 0;
if (i == k) continue;
In theory, you could design a programming language to require braces except for control flow statements, but this safety would probably be outweighed by the additional language complexity.
C/C++ also doesn't mandate strict adherence to K&R style. For people who prefer Allman style, the extra braces are completely unnecessary:
if (condition)
statement;
if (condition)
{
statement;
statement;
}
With the introduction of lambdas and braced initialization, we are seeing an increasing number of braces in our code and I think Allman style actually manages it better.
You misunderstand it slightly. The braces are neither optional nor compulsory. There was no decision about whether to allow them or to enforce them at all. The braces are not part of the if statement (or any similar construct). They belong to the statement that follows the condition. Simple statements will not have them but compound statements will (by definition, a compound statement is a list of statements enclosed in braces).
This situation and how the compiler handles it comes directly from this syntax definition (which is, actually, nothing C-specific, most other languages of the era that I can think of, Algol, Pascal, whatever, even if they used keywords instead of braces, do the very same). Basically, any language that has no specific ending keywords (like if and end if or if and fi) have to handle it this way.
When you do include the braces around a single statement in an if or any other similar construct, you don't use an optional form of if allowing braces. You just happen to provide it with a compound statement (hence, enclosed in braces) consisting only of a single statement inside, and this is perfectly legal. But it's the same for the compiler, it doesn't handle it any different.
Actually, but this is a question of taste, of course, many programmers prefer to omit them because it reduces clutter.
A scope is merely a compound statement. It can be placed anywhere:
void foo() {
int a = 0;
a = 1; // this is a simple statement.
{ // this is inside a scope. It is a compound statement. Note that I don't need an if.
a = 2;
a++;
}
return a;
}
Being able to put any statement (compound or not) after an if, while or other constructs makes the syntax much simpler to implement.
C was created in a time when computers had 40×25 character graphics.
If you want to have indentation of the condition block/statement
you save one (opening brace on the same line) or two lines (opening brace on the following line):
if (condition)
statement;
if (condition) {
statement;
}
if (condition)
{
statement;
}
So saving a line was important. It allowed to show more source code in one screen.
Related
I have the following fragment in my Bison file that describes a simple "while" loop as a condition followed by a sequence of statements. The list of statements is large and includes BREAK and CONTINUE. The latter two can be used only within a loop.
%start statements
%%
statements: | statement statements
statement: loop | BREAK | CONTINUE | WRITE | ...
loop: WHILE condition statements ENDWHILE
condition: ...
%%
I can add a C variable, set it upon entering the loop, reset it upon exiting, and check at BREAK or CONTINUE, but this solution does not look elegant:
loop: WHILE {loop++;} condition statements {loop--;} ENDWHILE
statement: loop | BREAK {if (!loop) yyerror();} ...
Is there a way to prevent the two statements from outside a loop using only Bison rules?
P.S. What I mean is "Is there an EASY way..," without fully duplicating the grammar.
Sure. You just need three different statement non-terminals, one which matches all statements; one which matches everything but continue (for switch blocks), and one which matches everything but break and continue. Of course, this distinction needs to trickle down through your rules. You'll also need three versions of each type of compound statement: loops, conditionals, switch, braced blocks, and so on. Oh, and don't forget that statements can be labelled, so there are some more non-terminals to duplicate.
But yeah, it can certainly be done. The question is, is it worth going to all that trouble. Or, to put it another way, what do you get out of it?
To start with, the end user finds that where they used to have a pretty informative error message about continue statements outside a loop, they now just get a generic Syntax Error. Now, you can fix that with some more grammar modifications, by actually providing productions which match the invalid statements, and then present a meaningful error message. But that's almost exactly the same code already rejected as inelegant.
Other than that, does it in any way reduce parser complexity? It lets you assume that a break statement is legally placed, but you still have to figure out where the break statement's destination. And other than that, there's not really a lot of evident advantages, IMHO.
But if you want to do it, go for it.
Once you've done that, you could try modifying your grammar so that break, continue, goto and return cannot be followed by an unlabelled statement. That sounds like a good idea, and some languages do it. It can certainly be done in the grammar. (But before you get too enthusiastic, remember that some programmers do deliberately create dead code during debugging sessions, and they won't thank you for making it impossible.)
There is a BNF extension, used in the ECMAscript standard, amongst others, which parameterizes non-terminals with a list of features, each of which can be present or not. These parameters can then be used in productions, either as conditions or to be passed through to non-terminals on the right-hand side. This could be used to generate three versions of statement, using the features [continue] and [break], which would be used as gates on those respective statement syntaxes, and also passed through to the compound statement non-terminals.
I don't know of a parser generator capable of handling such parameterised rules, so I can't offer it as a concrete suggestion, but this question is one of the use cases which motivated parameterised non-terminals. (In fact, I believe it's one of the uses, but I might be remembering that wrong.)
With an ECMAScript-style formalism, this grammatical restriction could be written without duplicating rules. The duplication would still be there, under the surface, since the parser generator would have to macro expand the templated rules into their various boolean possibilities. But the grammar is a lot more readable and the size of the state machine is not so important these days.
I have no doubt that it would be a useful feature, but I also suspect that it would be overused, with the result that the quality of error messages would be reduced.
As a general rule, compilers should be optimised for correct inputs, with the additional goal of producing helpful error messages for invalid input. Complicating the grammar even a little to make easily described errors into syntax errors does not help with either of these goals. If it's possible to write a few lines of code to produce the correct error message for a detected problem, instead of emitting a generic syntax error, I would definitely do that.
There are (many) other use cases for the ECMAScript BNF extensions. For example they make it much easier to describe a syntax whose naive grammar requires two or three lookahead tokens.
The following example is illegal C program, which is confusing and shows that a declaration is not a statement in C language.
int main() {
if (1) int x;
}
I've read the specification of C (N2176) and I know C language distinguish declaration and statement in the syntax specification. I told my teacher who teaches compiler, and he seems not believe it and I cannot convince him unless I showed him the specification.
So, I am also really confused. Why C is designed like this? Why a declaration is not a statement in C? How to convince someone of the reason of this design?
Because there is no apparent grammatical or technical semantic reason that a declaration cannot appear wherever a statement may appear, this appears to be largely due to history and lack of utility.
Considering the Grammar
Statements enter the C grammar in the function-definition rule, in which a compound-statement appears. A compound-statement allows a statement. Then inspecting the replacements for a statement reveals the places where a statement may appear but a declaration may not:
In a labeled-statement, after a label followed by :.
In a selection-statement, after the ) of an if or a switch or after an else.
In an iteration-statement, after the ) of a while or for or after a do.
The following is not a formal analysis of the grammar, but it appears the places where a statement may appear but a declaration may not are quite limited: After the : that ends a label, after a keyword (else or do) or after the closing ) for a ( that immediately follows a keyword (if, switch, while, or for). These seem to me like unambiguous points in the grammar, where it should be as easy to distinguish declarations and statements as it is to do so after a ; in a compound-statement.
Therefore, I do not think there is a grammatical reason not to allow a declaration to appear anywhere a statement may appear (or, equivalently, to define a declaration as a kind of statement).
Considering the Semantics
Now consider the semantic effects of allowing declarations in the places where currently a statement may appear but a declaration may not.
In the case of a labeled-statement, where we desired to have label: declaration, we can use label: ; declaration, where we have inserted a null statement after the :. The result is a defined code sequence with semantic effect equivalent to what we would desire to have by allowing a declaration immediately after the label.
In the other cases, where we desire to have declaration, we can use { declaration }. Again, the result is a defined code sequence with semantic effect equivalent to what we would desire to have by allowing a bare declaration. That effect is minimal; any expressions in the declaration (in array declarators or initializers) will be evaluated, but anything that is declared goes out of scope immediately. Note that even if the scope were not ended by the closing }, it is ended by the fact that the C standard defines each of these places to be a block. (C 2018 6.8.4 3 says the substatement of a selection-statement is a block, and 6.8.5 5 says the loop body of an iteration-statement is a block.)
Nonetheless, this shows there is no technical semantic impediment to allowing a declaration wherever a statement may appear.
Conclusion
Since the grammar and semantics of C apparently do not preclude allowing a declaration to be a type of statement, we are left with reasons of history and utility. In C as described in the first edition of Kernighan and Ritchie’s The C Programming Language, the locations of declarations were limited. Inside functions, they could only appear at the start of a compound statement. A declaration could not follow a statement. As we see from modern C, there was no grammatical or semantic reason for this limitation; we can allow declarations anywhere within a compound statement. So it seems simply that, around 1978, work on the language had not progressed that far.
Similarly, it seems that current work on C has not gone to the point of allowing a declaration to appear anywhere a statement may appear, as if it were a type of statement, even though there may be no technical impediment. However, in this case, there is less motivation for loosening the rules. Of the above cases, the only one that is of much use is allowing a declaration in a labeled statement. And, as its desired effect is easily accomplished by inserting a null statement, there is likely insufficient motivation to change compilers and to advocate for the changes in the C committee.
That's because a declaration doesn't instruct the compiler to do anything, it's purely informative for the compiler; at least by the standard. Compilers may do something if they see a declaration, the standard does not forbid it either but it doesn't require them to do anything if they only see a declaration and whatever it declares is never used within any statement.
Consider this code:
int main ( )
{
int x;
printf("Hello World!\n");
return 0;
}
What do you think will int x; do? You are declaring that x is of type int but you are never using x anywhere in the rest of the code. The compiler doesn't even have to reserve any memory on stack for it. It may to so but it isn't required to do so.
The standard allows the compiler to create exactly the same code as if you had written:
int main ( )
{
printf("Hello World!\n");
return 0;
}
There is simply nothing a compiler must do if you let it know the type of a variable. This variable doesn't have to exist anywhere at all unless it is ever used by a statement.
C is not an interpreted language where every piece of code instructs the interpreter to directly do something. C is a compiled language which means you tell the compiler to generate CPU code for you that performs the actions you described in a predefined language. So there is no one-to-one relationship between the code you write and the CPU code the compiler generates.
You may write
int x = a / 8;
but the CPU code that the compiler generate may be equivalent to
int x = a >> 3;
As that is exactly the same thing and if shifting is faster than division (and you can bet it is), the compiler does not have to generate a division just because you told it to do so. What you told the compiler is "I want x to be one eighth of a" and the compiler will be like "okay, I'll generate code that makes this happen" but how the compiler is making it happen is up to the compiler.
Thus the compiler only needs to translate statements to CPU code. Actually only statements that have an effect but to find out about that expensive analysis may be required so it's no standard violation to translate all statements to code, even those that do nothing. A declaration on its own has never an effect, it just lets the compiler know the type of a variable or function, which may become important in statements later on but only if the variable/function is ever actually used.
If it were valid, what would you like this program to do ? :
#include <stdio.h>
int main (int argc, char **argv)
{
if (argc > 1) int x=42;
printf("%d\n", x);
return 0;
}
I am reading The C Programming Language (K&R) and noticed C allows the use of while loops preceding a single statement to function without any braces; why did the creators of C decide to support this? I presume this introduces some extra complexity for the compiler, is the desire for single statement while loops so common (for readability, perhaps?) it was worth whatever trade-off was required to allow them?
It doesn't add any special complexity to the compiler, and it's not just while loops. All of the control structures (if, for, while, etc.) govern a "statement", where a block is just a special case of a statement (called a "compound-statement") containing 0 or more declarations or statements. There isn't any specific use case or rationale for applying this rule to while, but none is really needed, other than maybe simplicity or consistency.
In a C switch-case flow control, it's required to put curly braces { } after a case if variables are being defined in that block.
Is it bad practice to put curly braces after every case, regardless of variable declaration?
For example:
switch(i) {
case 1: {
int j = 4;
...code...
} break;
case 2: { //No variable being declared! Brace OK?
...code...
} break;
}
It's certainly not invalid to use braces in every case block, and it's not necessarily bad style either. If you have some case blocks with braces due to variable declarations, adding braces to the others can make the coding style more consistent.
That being said, it's probably not a good idea to declare variables inside case blocks in straight C. While that might be allowed by your compiler, there's probably a cleaner solution. Mutually-exclusive case blocks may be able to share several common temporary variables, or you may find that your case blocks would work better as helper functions.
Braces may be used in every case statement without any speed penalty, due to the way compilers optimize code. So it's just the style and the preference of the coder.
The most preferred usage is not using braces, though the usage of them in every case during an active development may be found easier to make some additions on the code every now and then.
It's just the easthetics; because a 'case' statement doesn't need only a single command, but will walk through the code as it works as a label. So blocks are not needed, and are not invalid.
In 'case's with variables; braces are used just-in-case, to create contexts for variables, and it makes big sense to use them. Some compilers on different platforms show different behaviours if they are not included.
I consider it bad style to use braces in each case. Cases are labels in C, akin to goto labels. And in the current C language, you're free to declare variables in each case (or anywhere you like) without introducing new blocks, though some people (myself included) also consider that bad style.
Generally it is bad practice jump over the initialization of a variable, be it with goto or switch. This is what happens when you don't have the the blocks per case.
There is even a case in C99 where jumping over the initialization is illegal, namely variable length arrays. They must be "constructed" similarly as non-PODs in C++, their initialization is necessary for the access of the variable later. So in this case you must use the block statement.
Just to add a minor point many editors & IDEs allow blocks to be collapsed and/or auto indented and several allow you to jump to the matching brace - I personally don't know of any that allow you to jump from a break to the matching case statement.
When debugging, or re-factoring, other peoples, (or even your own after a few months), code that contains complex case statements the ability to both collapse sections of the code and to jump to matching cases is invaluable, especially if the code contains indentation variations.
That said it is almost always good advice to avoid complex case statements like the plague.
Having been writing Java code for many years, I was amazed when I saw this C++ statement:
int a,b;
int c = (a=1, b=a+2, b*3);
My question is: Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
I think the compiler will see it the same as the following:
int a=1, b=a+2;
int c = b*3;
(What's the offical name for this? I assume it's a standard C/C++ syntax.)
It's the comma operator, used twice. You are correct about the result, and I don't see much point in using it that way.
Looks like an obscure use of a , (comma) operator.
It's not a representative way of doing things in C++.
The only "good-style" use for the comma operator might be in a for statement that has multiple loop variables, used something like this:
// Copy from source buffer to destination buffer until we see a zero
for (char *src = source, *dst = dest; *src != 0; ++src, ++dst) {
*dst = *src;
}
I put "good-style" in scare quotes because there is almost always a better way than to use the comma operator.
Another context where I've seen this used is with the ternary operator, when you want to have multiple side effects, e.g.,
bool didStuff = DoWeNeedToDoStuff() ? (Foo(), Bar(), Baz(), true) : false;
Again, there are better ways to express this kind of thing. These idioms are holdovers from the days when we could only see 24 lines of text on our monitors, and squeezing a lot of stuff into each line had some practical importance.
Dunno its name, but it seems to be missing from the Job Security Coding Guidelines!
Seriously: C++ allows you to a do a lot of things in many contexts, even when they are not necessarily sound. With great power comes great responsibility...
This is called 'obfuscated C'. It is legal, but intended to confuse the reader. And it seems to have worked. Unless you're trying to be obscure it's best avoided.
Hotei
Your sample code use two not very well known by beginners (but not really hidden either) features of C expressions:
the comma operator : a normal binary operator whose role is to return the last of it's two operands. If operands are expression they are evaluated from left to right.
assignment as an operator that returns a value. C assignment is not a statement as in other languages, and returns the value that has been assigned.
Most use cases of both these feature involve some form of obfuscation. But there is some legitimate ones. The point is that you can use them anywhere you can provide an expression : inside an if or a while conditional, in a for loop iteration block, in function call parameters (is using coma you must use parenthesis to avoid confusing with actual function parameters), in macro parameter, etc.
The most usual use of comma is probably in loop control, when you want to change two variables at once, or store some value before performing loop test, or loop iteration.
For example a reverse function can be written as below, thanks to comma operator:
void reverse(int * d, int len){
int i, j;
for (i = 0, j = len - 1 ; i < j ; i++, j--){
SWAP(d[i], d[j]);
}
}
Another legitimate (not obfuscated, really) use of coma operator I have in mind is a DEBUG macro I found in some project defined as:
#ifdef defined(DEBUGMODE)
#define DEBUG(x) printf x
#else
#define DEBUG(x) x
#endif
You use it like:
DEBUG(("my debug message with some value=%d\n", d));
If DEBUGMODE is on then you'll get a printf, if not the wrapper function will not be called but the expression between parenthesis is still valid C. The point is that any side effect of printing code will apply both in release code and debug code, like those introduced by:
DEBUG(("my debug message with some value=%d\n", d++));
With the above macro d will always be incremented regardless of debug or release mode.
There is probably some other rare cases where comma and assignment values are useful and code is easier to write when you use them.
I agree that assignment operator is a great source of errors because it can easily be confused with == in a conditional.
I agree that as comma is also used with a different meaning in other contexts (function calls, initialisation lists, declaration lists) it was not a very good choice for an operator. But basically it's not worse than using < and > for template parameters in C++ and it exists in C from much older days.
Its strictly coding style and won't make any difference in your program. Especially since any decent C++ compiler will optimize it to
int a=1;
int b=3;
int c=9;
The math won't even be performed during assignment at runtime. (and some of the variables may even be eliminated entirely).
As to choice of coding style, I prefer the second example. Most of the time, less nesting is better, and you won't need the extra parenthesis. Since the use of commas exhibited will be known to virtually all C++ programmers, you have some choice of style. Otherwise, I would say put each assignment on its own line.
Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
It's both a choice of coding style and it has a real benefit.
It's clearly a different coding style as compared to your equivalent example.
The benefit is that I already know I would never want to employ the person who wrote it, not as a programmer anyway.
A use case: Bob comes to me with a piece of code containing that line. I have him transferred to marketing.
You have found a hideous abuse of the comma operator written by a programmer who probably wishes that C++ had multiple assignment. It doesn't. I'm reminded of the old saw that you can write FORTRAN in any language. Evidently you can try to write Dijkstra's language of guarded commands in C++.
To answer your question, it is purely a matter of (bad) style, and the compiler doesn't care—the compiler will generate exactly the same code as from something a C++ programmer would consider sane and sensible.
You can see this for yourself if you make two little example functions and compile both with the -S option.