How are if statements in C syntactically unambiguous?

How are if statements in C syntactically unambiguous? - c

I don't know a whole lot about C, but I understand the basics and as far as I can tell:
int main() {
if (1 == 1) printf("Hello World!\n");
return 0;
}
and
int main() {
if (1 == 1)
printf("Hello World!\n");
return 0;
}
and
int main() {
if (1 == 1) {
printf("Hello World!\n");
}
return 0;
}
are all precisely syntactically equivalent. The statement is true; the string is printed; the braces are (apparently) optional.
Sometimes, especially here on SO, I see something like the following:
int main() {
if (1 == 1)
printf("one is one\n");
printf("is this inside the if statement??/who kn0WS\n");
return 0;
}
By the power vested in CodeGolf, I have been led to believe that C is whitespace-agnostic; the lexical analyser breaks the tokens up into their component parts and strips whitespace outside strings.
(I mean, the whole reason for the semicolons-on-every-statement-thing is so the parser can strip \n, \t, literal spaces and still know where each statement ends, right??)
So how is it possible to unambiguously parse the previous bit of code (or perhaps someone can come up with a better example of what I mean), if whitespace is to be disregarded?
If C programmers want to write in whitespace-dependent Pythonic syntax, why do they write C, and why is it taught wherever C is taught that it's okay to write lexically ambiguous (both to me, a programmer, and the computer) statements like this?

if (1 == 1)
printf("one is one\n");
printf("is this inside the if statement??/who kn0WS\n");
The second printf() should never execute inside the if statement.
The reason being that the previous line ends with a semicolon, which indicates the end of the if-block to execute.
(I mean, the whole reason for the semicolons-on-every-statement-thing
is so the parser can strip \n, \t, literal spaces and still know where
each statement ends, right??)
So how is it possible to unambiguously parse the previous bit of code
(or perhaps someone can come up with a better example of what I mean),
if whitespace is to be disregarded?
Parsing example:
if (1 == 1) // if - ( and ) - statements (or block) follow, skip all whitespace
// no { found -> single statement, scan until ; (outside quotes / comments)
printf("one is one\n"); // ; encountered, end of if-block
Without braces, only one statement belongs to the if-block.
But, as said already, it's a good habit to use braces. If you later add a statement (a quick temporary printf() for example), it will always be inside the block.
Special case:
int i = 0;
while(i++ < 10);
printf("%d", i);
Here printf() will only execute once. Mark the ; at the end of while().
In case of an empty statement, it's better to use:
while(i++ < 10)
;
to make the intention clear (or, as an alternative, an empty block {} can be used as well).

In C, an if statement takes exactly statement after the truth expression, regardless of indentation. Normally this statement is indented for clarity, but C ignores indentation. In any case, there is no amiguity in any of your examples.
What is ambiguous, in C and in many other languages, is the "dangling else". For instance, suppose you have a nested if statement with a single else after the second one. It could group as:
if (expr)
if (expr)
statement
else
statement
Or it could group as:
if (expr)
if (expr)
statement
else
statement
The only difference between these two is how they're indented, which C ignores. In this case, the ambiguity is resolved by using the first interpretation, i.e., the else statement binds to the nearest preceding if statement. To achieve the second interpretation, curly braces are needed:
if (expr) {
if (expr)
statement
}
else
statement
However, even in the first case, it's a good idea to include the curly braces, even though they aren't required:
if (expr) {
if (expr)
statement
else
statement
}

tl;dr The only ambiguity is in how difficult it is for a human to read. From the compiler's perspective, the syntax is perfectly unambiguous.
There are only two (compilable and syntactically acceptable) possibilities after an if statement:
Braces, as in
if(x) {
DoFoo();
}
// or
if(x) { DoFoo(); }
In this case, whatever is in the {...} will execute if the condition is met.
No braces, as in
if(x)
DoFoo();
// or
if(x) DoFoo();
In this case, only the next statement will execute if the condition is met.
You are correct that C is whitespace-agnostic. As a result, omitting the braces can lead to some tricky bugs. For example, in this code, DoBar() will execute whether or not the condition is met:
if(x)
DoFoo();
DoBar();
Inconsistent use of braces can also easily result in invalid code. For example, this looks valid (at a glance) from a human perspective, but it's not:
if(x)
DoFoo();
DoBar();
else
DoBaz();
None of the examples you posted are ambiguous from the compiler's perspective, but the no-braces versions are confusing from a human perspective. Leaving out the braces frequently leads to hard-to-find bugs.

Without braces it is just the next statement after the if. The whitespace does not matter
It is good practice and makes life easier to always use the braces. Good indentation as well. Code is then easy to read and does not lead to errors when people add/remove statements after the if

Readability is the only ambiguity in the statement:
if (1 == 1)
printf("one is one\n");
printf("is this inside the if statement??/who kn0WS\n");
The only time the first statement following the if(...) statement should execute, is if it is evaluated TRUE.
Braces, {...} help to remove readability ambiguities,
if (1 == 1)
{
printf("one is one\n");
}
printf("is this inside the if statement??/who kn0WS\n");
but the syntax rules are still the same.
Opinions vary, but I always choose to use braces.
Not using them is fine at the time code is written. But down the road, you just know someone will come along and add another statement under the first and expect it to be executed.

In general, in a statement or a loop with a single instruction, the curly brackets are optionals; instead, if you have two or more instructions you have to add them.
For example :
for(i = 0; i < 2; i++)
for(j = 0; j < 4; j++)
If(...)
printf(..);
else
printf(..);
Is equivalent to :
for(i = 0; i < 2; i++)
{
for(j = 0; j < 4; j++)
{
If(...)
{
printf(..);
}
else
{
printf(..);
}
}
}
As you may note this is more something related to indentation of the code. Personally I don't use curly brackets if I have a single instruction as doing that will make your code shorter and cleaner.

Another reason to use braces is that a simple typo can bite you hard:
#include <stdio.h>
int main(void) {
if (0 == 1)
printf("zero is one\n"),
printf("is this inside the if statement?? /who kn0WS\n");
return 0;
}
Look carefully...

Related

Iterative Statements

So, I have been studying iterative statements for a report. While I was reading, I came across the developmental history of definite iteration and eventually learn the for loop. We know that the syntax for the for loop in C,C++, and java is
for (expression1; expression2; expression3)
statement
And it says here that we can omit any of the expression and that it is legal to have a for loop that look like this
for (;;)
My question is how does that work? I cant find any more resources for this one.

A for loop declared as:
for (init-statement; condition; iteration-expression) body;
Is equivalent to:
init-statement;
while (condition) {
body;
iteration-expression;
}
It's easy to see how init-statement or iteration-expression could be omitted. If the condition is omitted, it is assumed to be true.
A reasonable resource that explains this is the Explanation section of the for loop documentation at cppreference.com.

Basically, for (;;) is legal, but you will need to put something inside the body of that for loop, otherwise the loop will never stop.
int counter = 0;
int limit = 5;
for (;;) {
if (counter > limit) break;
counter++;
}

Can someone explain how this works? [duplicate]

I saw this code:
if (cond) {
perror("an error occurred"), exit(1);
}
Why would you do that? Why not just:
if (cond) {
perror("an error occurred");
exit(1);
}

In your example it serves no reason at all. It is on occasion useful when written as
if(cond)
perror("an error occured"), exit(1) ;
-- then you don't need curly braces. But it's an invitation to disaster.
The comma operator is to put two or more expressions in a position where the reference only allows one. In your case, there is no need to use it; in other cases, such as in a while loop, it may be useful:
while (a = b, c < d)
...
where the actual "evaluation" of the while loop is governed solely on the last expression.

Legitimate cases of the comma operator are rare, but they do exist. One example is when you want to have something happen inside of a conditional evaluation. For instance:
std::wstring example;
auto it = example.begin();
while (it = std::find(it, example.end(), L'\\'), it != example.end())
{
// Do something to each backslash in `example`
}
It can also be used in places where you can only place a single expression, but want two things to happen. For instance, the following loop increments x and decrements y in the for loop's third component:
int x = 0;
int y = some_number;
for(; x < y; ++x, --y)
{
// Do something which uses a converging x and y
}
Don't go looking for uses of it, but if it is appropriate, don't be afraid to use it, and don't be thrown for a loop if you see someone else using it. If you have two things which have no reason not to be separate statements, make them separate statements instead of using the comma operator.

The main use of the comma operator is obfuscation; it permits doing two
things where the reader only expects one. One of the most frequent
uses—adding side effects to a condition, falls under this
category. There are a few cases which might be considered valid,
however:
The one which was used to present it in K&R: incrementing two
variables in a for loop. In modern code, this might occur in a
function like std::transform, or std::copy, where an output iterator
is incremented symultaneously with the input iterator. (More often, of
course, these functions will contain a while loop, with the
incrementations in separate statements at the end of the loop. In such
cases, there's no point in using a comma rather than two statements.)
Another case which comes to mind is data validation of input parameters
in an initializer list:
MyClass::MyClass( T const& param )
: member( (validate( param ), param) )
{
}
(This assumes that validate( param ) will throw an exception if
something is wrong.) This use isn't particularly attractive, especially
as it needs the extra parentheses, but there aren't many alternatives.
Finally, I've sometimes seen the convention:
ScopedLock( myMutex ), protectedFunction();
, which avoids having to invent a name for the ScopedLock. To tell
the truth, I don't like it, but I have seen it used, and the alternative
of adding extra braces to ensure that the ScopedLock is immediately
destructed isn't very pretty either.

This can be better understood by taking some examples:
First:
Consider an expression:
x = ++j;
But for time being, if we need to assign a temporarily debug value, then we can write.
x = DEBUG_VALUE, ++j;
Second:
Comma , operators are frequently used in for() -loop e.g.:
for(i = 0, j = 10; i < N; j--, i++)
// ^ ^ here we can't use ;
Third:
One more example(actually one may find doing this interesting):
if (x = 16 / 4), if remainder is zero then print x = x - 1;
if (x = 16 / 5), if remainder is zero then print x = x + 1;
It can also be done in a single step;
if(x = n / d, n % d) // == x = n / d; if(n % d)
printf("Remainder not zero, x + 1 = %d", (x + 1));
else
printf("Remainder is zero, x - 1 = %d", (x - 1));
PS: It may also be interesting to know that sometimes it is disastrous to use , operator. For example in the question Strtok usage, code not working, by mistake, OP forgot to write name of the function and instead of writing tokens = strtok(NULL, ",'");, he wrote tokens = (NULL, ",'"); and he was not getting compilation error --but its a valid expression that tokens = ",'"; caused an infinite loop in his program.

The comma operator allows grouping expression where one is expected.
For example it can be useful in some case :
// In a loop
while ( a--, a < d ) ...
But in you case there is no reason to use it. It will be confusing... that's it...
In your case, it is just to avoid curly braces :
if(cond)
perror("an error occurred"), exit(1);
// =>
if (cond)
{
perror("an error occurred");
exit(1);
}
A link to a comma operator documentation.

There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
Most of the oft usage of comma can be found out in the wikipedia article Comma_operator#Uses.
One interesting usage I have found out when using the boost::assign, where it had judiciously overloaded the operator to make it behave as a comma separated list of values which can be pushed to the end of a vector object
#include <boost/assign/std/vector.hpp> // for 'operator+=()'
using namespace std;
using namespace boost::assign; // bring 'operator+=()' into scope
{
vector<int> values;
values += 1,2,3,4,5,6,7,8,9; // insert values at the end of the container
}
Unfortunately, the above usage which was popular for prototyping would now look archaic once compilers start supporting Uniform Initialization
So that leaves us back to
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++

In your case, the comma operator is useless since it could have been used to avoid curly braces, but it's not the case since the writer has already put them. Therefore it's useless and may be confusing.

It could be useful for the itinerary operator if you want to execute two or more instructions when the condition is true or false. but keep in mind that the return value will be the most right expression due to the comma operator left to right evalutaion rule (I mean inside the parentheses)
For instance:
a<b?(x=5,b=6,d=i):exit(1);

The boost::assign overloads the comma operator heavily to achieve this kind of syntax:
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;

Adding a `;` at the end of while loop

#include<stdio.h>
int main()
{
int i=5;
while(i--)
{
printf("%d\n",i);
};
printf("Out of loop\n");
return 0;
}
This code works fine! Does ; at the end of a loop not mean anything?
So even if we add it, we don't have any problems? Just curious! I think adding ; at the end of a loop doesn't make any sense.

It "works", but it's an empty statement so it changes the structure of your program.
Consider this:
if (foo())
while (bar())
{
foo_some_more()
}
else
{
do_something_about_it();
}
The above works since the while is a single statement, and thus the else is still able to "find" the if properly. If you add a semi-colon after the while-loop's closing brace, you break that structure and it will no longer compile.
So, although empty statements can look harmless, they are not so they really should be avoided. Plus: they add pointless confusion to the code, of course.

The ; here is considered as an empty statement or null statement.
Sample
int i = 0;;
For more information, you can check this question.
Word of Caution: Don't consider this as a thumb rule. There are cases where the representation may appear same, but they are not null statement but part of a syntax. Example: do..while loop

A superfluous ; is an empty statement and is redundant in this particular instance.
One instance where you're required to use an empty statement is in a switch to a final branch that does nothing:
switch (expression)
{
case 1:
/*Some statements here or empty*/
case 2:
; /*A statement is required between here and the closing brace
An empty statement will suffice.*/
}
You ought to avoid using superfluous empty statements; they can emit bugs (particularly if you enclose your code in if statements during a refactoring effort), and older compilers will complain. Using excess semicolons in macros is particularly pernicious.

These are Perfectly Valid, They are called Empty statements,They do nothing.
int main()
{
;
;
;
;
{
}
return 0;
}

As stated by various users above, the relevant semi-colon is an empty statement and has no effect.
This however is a slightly different story:
#include <stdio.h>
int main()
{
int i=5;
while(i--); //<--- Put the semi-colon here and see what you get.
{
printf("%d\n",i);
}
printf("Out of loop\n");
return 0;
}

#include<stdio.h>
int main()
{
int i=5;
while(i--)
{
printf("%d\n",i);
}/*blank statement*/;
printf("Out of loop\n");
return 0;
}
it's just a blank statement.
above The braces itself tells end of the block so then it would treat the ; as blank stament

assignment works as a condition

Consider the following Code,
int i;
while(i=0)
printf("Hello");
Now Generally speaking i=0 is an assignment and not a condition for while to check.
But the GCC compiler lets it go with a warning and even evaluates it correctly (does not execute the print statement).
Why? I usually would do with parenthesis for the truth value but my juniors feel that I am wrong and there is no real reason for the parenthesis in this!
EDIT: Zeroing down on the 'actual' doubt, Please consider the following test case
int callme(){
return 0;
}
int main(int argc,char*argv[]){
int c;
while(c = callme()){
printf("Calling...\n");
}
return 0;
}

The expression i = 0 does 2 things:
Has the side effect of storing o in i
Yields the value 0
I usually would do with parenthesis for the truth value but my juniors
feel that i am wrong and there is no real reason for the parenthesis
in this
It's usually a hint to the compiler meaning "I actually want this, I didn't forget a =, shut up".
For your specific case there's no reason to write if (i = 0): you already know what if (0) does. But it's pretty useful when used as:
if ((i = some_function()))
...

i=0 is always an assignment (unless you have it as part of int i = 0; where it is an initialization). But any non-void expression may appear inside the condition of a while loop and if it evaluates to non-zero, the body of the loop will be executed, and if it is zero, the body of the loop will not be executed.
The notation:
while (i = 0)
printf("Hello\n");
is always equivalent to:
i = 0;
There is very little justification for writing the loop at all.
People do write other expressions:
while (c = getchar())
...process EOF or a non-null character...
But that's usually a bug. It is more likely that you should be writing:
while ((c = getchar()) != EOF)
...process a character - possibly null...
or even:
while ((c = getchar()) != EOF && c != '\0')
...process a non-null character...
The first getchar() loop gets a warning from GCC; the latter two do not because of the the explicit test of the value from the assignment.
The people who write a condition like this:
while ((c = getchar()))
really annoy me. It avoids the warning from GCC, but it is not (IMNSHO) a good way of coding.

When you use an assignment operator such as
a=0;
You assign the value to 'a', and still return the number 0.
To test your question, I tried these lines of codes:
int a;
printf("%d", a=0);
and these lines displayed 0.
Then, I tested another set of codes:
int b;
printf("%d", b=15);
Here, the lines displayed 15.
So, if you do:
while(a=0)
{
printf("zero");
}
The (a=0) statement would return false, thus not displaying anything.
But if you do:
while(a=15)
{
printf("fifteen");
}
The "fifteen" will be displayed endlessly, because the statement (a=15) will return a non zero value, or 15, which is not false, not zero, thus it is true. :)

As cnicutar has told above the assignment also yields the value zero.
Some additional info:
It is a common coding mistake for people to omit an extra '=' whereby the comparison becomes an assignment.
An easy way to avoid this is to write the comparison as below, in which case even if a '=' is missed compiler will give an error
while(0 == i)
{
prinf("Hello");
}

Use of null statement in C

What are typical uses of null statement
;
in C ?
I know that it is basically used to skip expression where it is expected by the compiler, but here I'm interested only in real-world examples of such use cases.

It's typically the side-effect of a code block that was stripped by the preprocessor, like
#if DEBUG
#define ASSERT(_x) Assert(_x)
#else
#define ASSERT(_x)
#endif
ASSERT(test); // Results in null statement in non-debug builds
That, or in loops where your condition already contains whatever needs to be done in each iteration.

while (*(dst++) = *(src++))
;

After a label at the end of a function (or more precisely, at the end of any block), e.g.
void foo(void)
{
// ...
exit:
;
}

while (somethingWithSideEffects()) ;

I have used it, albeit rarely, in a possibly unusual situation (and one that some/many people would find wrong). I have had to sometimes write a very complex if condition without an else clause where the if condition has to be negated. Obviously it can be something like this:
if ( !( overly complex condition ) )
{
do stuff
}
It sometimes makes more sense (to me at least) to think of it in terms of positive logic. In other words, if the overly complex condition holds true, I don't want the code to run. So I have instead written it as:
if ( overly complex condition )
; // do nothing
else
{
do stuff
}

Example:
while (!kbhit())
;
Should be self-explanatory.

Unit tests for a compliant compiler.

I can think of scanf validation. scanf gets stuck when user didn't give the correct input. So, to prevent scanf from being stuck, characters until end of line must be removed.
if( scanf("%d",&integer) == 0 )
{
while( getchar() != '\n' ) ;
// ....
}

A somewhat unusual use -- but for which I really appreciate the existence of the null statement -- is when I have two conditions and two actions which I find I can most naturally express like this:
if(condition1)
/* do nothing */ ;
else if(condition2)
do_something;
else do_something_else;
Often condition1 tests that everything is okay, but if it's not, condition2 distinguishes between two different exception actions do_something and do_something_else.
This isn't the only way to express such a thing, of course. It would be possible to repeat condition1:
if(!condition1 && condition2)
do_something;
else if(!condition1)
do_something_else;
But that seems inferior, because it repeats condition1. Or it would be possible to use nested if statements:
if(!condition1) {
if(condition2)
do_something;
else do_something_else;
}
But of course nested if statements are notoriously prone to overcomplication and obfuscation, too. So I often prefer the first version, with the null statement.

The only uses I can think of are:
1- At the end of a loop, where the operations are already encoded within the loop statements. e.g. while(a[i--]);
2- At the end of a label, where no operation is needed to be done. e.g. Label: ;

i was wondering how to write a null expression into the inline if and came up with this. it compiles and works.
condition ? x = 1 : "do nothing";
fun stuff.

It's more of a null expression rather than a null statement, but it's often found in for loops.
for (;;) // Loop "forever"
for (int i=10; i--; ) // 9..0
etc

The only place I use null statements is when a case begins with a declaration:
switch(x) {
case 5: ;
int y = makeValue(z);
...
break;
...
}
This will not compile if you remove the null statement that the case begins with. The reason is that a label cannot precede a declaration.