Interpretation of instructions without effect - c

How can we interpret the following program and its success?(Its obvious that there must not be any error message). I mean how does compiler interpret lines 2 and 3 inside main?
#include <stdio.h>
int main()
{
int a,b;
a; //(2)
b; //(3)
return 0;
}

Your
a;
is just an expression statement. As always in C, the full expression in expression statement is evaluated and its result is immediately discarded.
For example, this
a = 2 + 3;
is an expression statement containing full expression a = 2 + 3. That expression evaluates to 5 and also has a side-effect of writing 5 into a. The result is evaluated and discarded.
Expression statement
a;
is treated in the same way, except that is has no side-effects. Since you forgot to initialize your variables, evaluation of the above expression can formally lead to undefined behavior.
Obviously, practical compilers will simply skip such expression statements entirely, since they have no observable behavior.

That's why you should use some compilation warning flags!
-Wall would trigger a "statement with no effect" warning.
If you want to see what the compilation produces, compile using -S.
Try it with your code, with/without -O (optimization) flag...

This is just like you try something like this:
#include <stdio.h>
int main(void){
1;
2;
return 0;
}
As we can see we have here two expressions followed by semicolon (1; and 2;). It is a well formed statement according to the rules of the language.
There is nothing wrong with it, it is just useless.
But if you try to use though statements (a or b) the behavior will be undefined.
Of course that, the compiler will interpret it as a statement with no effect
L.E:
If you run this:
#include <stdio.h>
int main(void){
int a;
int b;
printf("A = %d\n",a);
printf("B = %d\n",b);
if (a < b){
printf("TRUE");
}else{
printf("FALSE");
}
return 0;
}
You wil get:
A = 0
B = 0
FALSE
Because a and b are set to 0;

Sentences in C wich are not control structures (if, switch, for, while, do while) or control statements (break, continue, goto, return) are expressions.
Every expression has a resulting value.
An expression is evaluated for its side effects (change the value of an object, write a file, read volatile objects, and functions doing some of these things).
The final result of such an expression is always discarded.
For example, the function printf() returns an int value, that in general is not used. However this value is produced, and then discarded.
However the function printf() produces side effects, so it has to be processed.
If a sentence has no side effects, then the compiler is free to discard it at all.
I think that for a compiler will not be so hard to check if a sentence has not any side effects. So, what you can expect in this case is that the compiler will choose to do nothing.
Moreover, this will not affect the observable behaviour of the program, so there is no difference in what is obtained in the resulting execution of the program. However, of course, the program will run faster if any computation is ignored at all by the compiler.
Also, note that in some cases the floating point environment can set flags, which are considered side-effects.
The Standard C (C11) says, as part of paragraph 5.1.2.3p.4:
An actual implementation need not evaluate part of an expression if it
can deduce that its value is not used and that no needed side effects
are produced [...]
CONCLUSION: One has to read the documentation of the particular compiler that oneself is using.

Related

Recursive function is giving output when there is no return value for f(0) [duplicate]

Why does following code has a correct output? int GGT has no return statement, but the code does work anyway? There are no global variables set.
#include <stdio.h>
#include <stdlib.h>
int GGT(int, int);
void main() {
int x1, x2;
printf("Bitte geben Sie zwei Zahlen ein: \n");
scanf("%d", &x1);
scanf("%d", &x2);
printf("GGT ist: %d\n", GGT(x1, x2));
system("Pause");
}
int GGT(int x1, int x2) {
while(x1 != x2) {
if(x1 > x2) {
/*return*/ x1 = x1 - x2;
}
else {
/*return*/ x2 = x2 - x1;
}
}
}
For x86 at least, the return value of this function should be in eax register. Anything that was there will be considered to be the return value by the caller.
Because eax is used as return register, it is often used as "scratch" register by callee, because it does not need to be preserved. This means that it's very possible that it will be used as any of local variables. Because both of them are equal at the end, it's more probable that the correct value will be left in eax.
It should not work and certainly do not work on all compilers and target OS, even if it works on yours.
The likely explanation is that a function returning int always return something, and it's usually the content of a register. It probably happens that the register used for return value is in your case the same used to compute the last expression before returning from the function (on x86 targets, certainly eax).
This being said, an optimizing compiler detecting that there is no return is allowed to completely remove the code of this function. Henceforth the effect you see (may) disappear when activating higher optimizations levels.
I tested it with gcc:
gcc without optimization:
inputs 10, 20 -> result is 10
gcc -O1
inputs 10, 20 -> result is 1
gcc -O2
inputs 10, 20 -> result is 0
If a function is defined to return a value but does not, and the calling function attempts to use the return value, you invoke undefined behavior.
This is spelled out in section 6.9.1p12 of the C standard:
If the } that terminates a function is reached, and the value of the
function call is used by the caller, the behavior is undefined.
You got "lucky" in this case that the program appeared to work properly, but there's no guarantee of that. Had you compiled with different optimization settings or a different compiler altogether you could end up with different results.
So the moral of the story: if the function says it returns a value, always return a value.
On x86 the return value is stored in EAX register, which "accidentally" is also used by this compiler to store the result of arithmetic operations (or at least subtraction). You can check this by looking at assembly generated by your compiler. I agree with kriss - you can't assume this will always be the case, so it's better to explicitly specify the return value.
Official verbiage:
6.9.1 Function definitions
...
12 If the } that terminates a function is reached, and the value of the function call is used by
the caller, the behavior is undefined.
where "undefined benavior" means:
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
C 2011 Online Draft
Believe it or not, a function typed something other than void is not required to have a return statement. It's not enforced by the language grammar, and there is no constraint that a non-void function must contain a return statement. The only constraints are that, if it is present, a return statement in a void function not return the value of any expression, and that a return statement in a non-void function must return the value of an expression.
Why is that the case?
C initially did not have a void data type, and there was no way to specify a subroutine that was only executed for its side effects and didn't return a value. There was no good way to return "nothing", so the return statement was not required. C also at this time had "implicit int" declarations - you could define a function body without a type, and the compiler would assume it was typed int:
foo( a, b ) // old style parameter declarations
int a; // still legal, but no longer really used
char *b; // for very good reasons
{
// do something interesting with a and b
}
foo is implicitly typed to return int, even if no value is explicitly being returned. This is okay as long as the caller doesn't try to use foo's non-existent return value. So a convention sort of developed where "procedures" (functions executed solely for side effects) were not explicitly typed, and had no return statement.
Because legacy code is forever, the behavior with respect to return statements hasn't been changed in more recent versions of the C standard, even though implicit int declarations were removed in C99.
GCC pastes "ret" instruction in such case, while clang pastes "ud2" and app crashes at run-time.

Why are statements with no effect considered legal in C?

Pardon if this question is naive. Consider the following program:
#include <stdio.h>
int main() {
int i = 1;
i = i + 2;
5;
i;
printf("i: %d\n", i);
}
In the above example, the statements 5; and i; seem totally superfluous, yet the code compiles without warnings or errors by default (however, gcc does throw a warning: statement with no effect [-Wunused-value] warning when ran with -Wall). They have no effect on the rest of the program, so why are they considered valid statements in the first place? Does the compiler simply ignore them? Are there any benefits to allowing such statements?
One benefit to allowing such statements is from code that's created by macros or other programs, rather than being written by humans.
As an example, imagine a function int do_stuff(void) that is supposed to return 0 on success or -1 on failure. It could be that support for "stuff" is optional, and so you could have a header file that does
#if STUFF_SUPPORTED
#define do_stuff() really_do_stuff()
#else
#define do_stuff() (-1)
#endif
Now imagine some code that wants to do stuff if possible, but may or may not really care whether it succeeds or fails:
void func1(void) {
if (do_stuff() == -1) {
printf("stuff did not work\n");
}
}
void func2(void) {
do_stuff(); // don't care if it works or not
more_stuff();
}
When STUFF_SUPPORTED is 0, the preprocessor will expand the call in func2 to a statement that just reads
(-1);
and so the compiler pass will see just the sort of "superfluous" statement that seems to bother you. Yet what else can one do? If you #define do_stuff() // nothing, then the code in func1 will break. (And you'll still have an empty statement in func2 that just reads ;, which is perhaps even more superfluous.) On the other hand, if you have to actually define a do_stuff() function that returns -1, you may incur the cost of a function call for no good reason.
Simple Statements in C are terminated by semicolon.
Simple Statements in C are expressions. An expression is a combination of variables, constants and operators. Every expression results in some value of a certain type that can be assigned to a variable.
Having said that some "smart compilers" might discard 5; and i; statements.
Statements with no effect are permitted because it would be more difficult to ban them than to permit them. This was more relevant when C was first designed and compilers were smaller and simpler.
An expression statement consists of an expression followed by a semicolon. Its behavior is to evaluate the expression and discard the result (if any). Normally the purpose is that the evaluation of the expression has side effects, but it's not always easy or even possible to determine whether a given expression has side effects.
For example, a function call is an expression, so a function call followed by a semicolon is a statement. Does this statement have any side effects?
some_function();
It's impossible to tell without seeing the implementation of some_function.
How about this?
obj;
Probably not -- but if obj is defined as volatile, then it does.
Permitting any expression to be made into an expression-statement by adding a semicolon makes the language definition simpler. Requiring the expression to have side effects would add complexity to the language definition and to the compiler. C is built on a consistent set of rules (function calls are expressions, assignments are expressions, an expression followed by a semicolon is a statement) and lets programmers do what they want without preventing them from doing things that may or may not make sense.
The statements you listed with no effect are examples of an expression statement, whose syntax is given in section 6.8.3p1 of the C standard as follows:
expression-statement:
expressionopt ;
All of section 6.5 is dedicated to the definition of an expression, but loosely speaking an expression consists of constants and identifiers linked with operators. Notably, an expression may or may not contain an assignment operator and it may or may not contain a function call.
So any expression followed by a semicolon qualifies as an expression statement. In fact, each of these lines from your code is an example of an expression statement:
i = i + 2;
5;
i;
printf("i: %d\n", i);
Some operators contain side effects such as the set of assignment operators and the pre/post increment/decrement operators, and the function call operator () may have a side effect depending on what the function in question does. There is no requirement however that one of the operators must have a side effect.
Here's another example:
atoi("1");
This is calling a function and discarding the result, just like the call printf in your example but the unlike printf the function call itself does not have a side effect.
Sometimes such a statements are very handy:
int foo(int x, int y, int z)
{
(void)y; //prevents warning
(void)z;
return x*x;
}
Or when reference manual tells us to just read the registers to archive something - for example to clear or set some flag (very common situation in the uC world)
#define SREG ((volatile uint32_t *)0x4000000)
#define DREG ((volatile uint32_t *)0x4004000)
void readSREG(void)
{
*SREG; //we read it here
*DREG; // and here
}
https://godbolt.org/z/6wjh_5

Where can I find weird, specific C syntax rules?

I will take an exam and my teacher asks weird C syntax rules. Like:
int q=5;
for(q=-2;q=-5;q+=3) { //assignment in condition part??
printf("%d",q); //prints -5
break;
}
Or
int d[][3][2]={4,5,6,7,8,9,10,11,12,13,14,15,16};
int i=-1;
int j;
j=d[i++][++i][++i];
printf("%d",j); //prints 4?? why j=d[0][0][0] ?
Or
extern int a;
int main() {
do {
do {
printf("%o",a); //prints 12
} while(!1);
} while(0);
return 0;
}
int a=10;
I could not find it rules any site or book. Really absurd and uncommon. Where can I find?
To me it seems that your teacher is asking questions which invole undefined behavior.
If you tell him that this is incorrect, you're directly confronting him.
However, you could do the following:
Compile the code on different platforms
Compile the code with different compilers
Compile the code with different versions of the same compiler
Build a matrix with the results. You'll find out that they differ
Show the results to your teacher ans ask him to explain why that happens
That way you do not say that he's wrong, you're just showing some facts and you're showing that you're willing to learn and work.
Do that a long before the exam so that the teacher can look into it and think about his questions so that he can change the exam in time.
I could not find it rules any site or book. Where can I find?
See Where do I find the current C or C++ standard documents?. If you have a good library at university, they should own a copy.
Concerning for(q=-2;q=-5;q+=3) {, all you need to do is to break this down into its components. q=-2 is ran first, then q=-5 is tested, and if that is not 0 (which it isn't since it's an expression with value -5), then the loop body runs once. Then break forces a premature exit from an otherwise infinite loop. The expression then q+=3 is never reached.
The behaviour of d[i++][++i][++i] is undefined. Tell your teacher that, tactfully.
The "%o" format denotes octal output. a is set to 10 in decimal which is 12 in octal. Your code would be clearer if you had written:
int a=012; // octal constant.
The online version of the C language standard has what you need (and is what I will be referring to in this answer); just bear in mind is is a language definition and not a tutorial, and as such may not be easy to read for someone who doesn't have a lot of experience yet.
Having said that, your teacher is throwing you a few foul balls. For example:
j=d[i++][++i][++i];
This statement results in undefined behavior for several reasons. The first several paragraphs of section 6.5 of the document linked above explain the problem, but in a nutshell:
Except in a few situations, C does not guarantee left-to-right evaluation of expressions; neither does it guarantee that side effects are applied immediately after evaluation;
Attempting to modify the value of an object more than once between sequence points1, or modifying and then trying to use the value of an object without an intervening sequence point, results in undefined behavior.
Basically, don't write anything of the form:
x = x++;
x++ * x++;
a[i] = i++;
a[i++] = i;
C does not guarantee that each ++i and i++ is evaluated from left to right, and it does not guarantee that the side effect of each evaluation is applied immediately. So the result of j[i++][++i][++i] is not well-defined, and the result will not be consistent over different programs, or even different builds of the same program2.
AND, on top of that, i++ evaluates to the current value of i; so clearly, your teacher's intent was for j[i++][++i][++i] to evaluate to j[-1][1][2], which would also result in undefined behavior since you're attempting to index outside of the array bounds.
This is why I hate, hate, hate it when teachers throw this kind of code at their students - not only is it needlessly confusing, not only does it encourage bad practice, but more often than not it's just plain wrong.
As for the other questions:
for(q=-2;q=-5;q+=3) { //assignment in condition part??
See sections 6.5.16 and 6.8.5.3. In short, an assignment expression has a value (the value of the left operand after any type conversions), and it can appear as part of a controlling expression in a for loop. As long as the result of the assignment is non-zero (as in the case above), the loop will execute.
printf("%o",a); //prints 12
See section 7.21.6.1. The o conversion specifier tells printf to format the integer value as octal: 1010 == 128
A sequence point is a point in a programs execution where an expression has been fully evaluated and any side effects have been applied. Sequence points occur at the ends of statements, between the evaluation of a function's parameters and the function call, after evaluating the left operand of the &&, ||, and ?: operators, and a few other places. See Annex C for the complete list.
Or even different runs of the same build, although in practice you won't see values change from run to run unless you're doing something really hinky.

Call by Need and Standard C Output?

The following code is here with keep the C Language Syntax:
#include <stdio.h>
int func(int a, int b){
if (b==0)
return 0;
else return func(a,b);
}
int main(){
printf("%d \n", func(func(1,1),func(0,0)));
return 0;
}
What is the output of this code at 1) run with standard C, 2) with any
language that has call by need property, Then:
in (1) the programs loop into infinite call and in (2) we have ouptut zero !! this is an example solved by TA in programming language course, any idea to
describe it for me? thanks
1) In C (which uses strict evaluation semantics) we get infinite recursion because in strict evaluation arguments are evaluated before a function is called. So in f(f(1,1), f(0,0)) f(1,1) and f(0,0) are evaluated before the outer f (which one of the two arguments is evaluated first is unspecified in C, but that does not matter). And since f(1,1) causes infinite recursion, we get infinite recursion.
2) In a language using non-strict evaluation (be it call-by-name or call-by-need) arguments are substituted into the function body unevaluated and are only evaluated when and if they're needed. So the outer call to f is evaluated first as such:
if (f(0, 0) == 0)
return 0;
else return f(f(1,1), f(0,0));
So when evaluating the if, we need to evaluate f(0,0), which simply evaluates to 0. So we go into the then-branch of the if and never execute the else-branch. Since all calls to f are only used in the else-branch, they're never needed and thus never evaluated. So there's no recursion, infinite or otherwise, and we just get 0.
With C, in general, it is not defined the order of arguments a and b evaluation with a function like int func(int a, int b)
Obviously evaluating func(1,1) is problematic and the code suffers from that regardless if func(1,1) is evaluated before/after/simultaneous with func(0,0)
Analysis of func(a,b) based on need may conclude that if b==0, no need to call func() and then replace with 0.
printf("%d \n", func(func(1,1),func(0,0)));
// functionally then becomes
printf("%d \n", func(func(1,1),0));
Applied again and
// functionally then becomes
printf("%d \n", 0);
Of course this conclusion is not certain as the analysis of b != 0 and else return func(a,b); leads to infinite recursion. Such code may have a useful desired side-effect (e.g. stack-overflow and system reset.) So the analysis may need to be conservative and not assume func(1,1) will ever return and not optimize out the call even if it optimized out the func(0,0) call.
To address the first part,
The C Draft, 6.5.2.2->10(Function calls) says
The order of evaluation of ... the actual arguments... is unspecified.
and for such reason, something such as
printf("%d%d",i++,++i);
has undefined behaviour, because
both ++i and i++ has side-effects, ie incrementing the value of i by one.
The comma inside printf is just a separator and NOT a [ sequence point ].
Even though function call itself is a sequence point, for the above reason, the order in which two modifications of i take place is not defined.
In your case
func(func(1,1),func(0,0))
though, the arguments for outer func ie func(1,1) or func(0,0) have no bearing on each other contrary to the case shown above. Any order of evaluation of these arguments eventually leads to infinite recursion and so the program crashes due to depleted memory.

variable 1 = ({statement 1;statement 2;}) construct in C

main()
{
int a=10,b=30,c=0;
if( c =({a+b;b-a;}))
{
printf("%d",c);
}
}
why the construct ({;}) is legal in C and why it returns the last statement value as the result of the expression ( why it work similar to comma operator)?
It is not legal standard C99, but it is a very useful GCC extension, called statement-exprs (a parenthesized brace compound statement ending by some expression).
IIRC, some other compilers support that extension, e.g. Clang/LLVM
Statement expressions are much more useful when they contain control flow changes and side-effects, like:
c = 2*({while (a>0) a--,b--; a+b;});
However, in your particular case, you could just use the comma operator
if (c=(a+b,b-a))
Since a+b does not have any side effect, I guess that an optimizing compiler could handle that as
if (c=b-a)
GCC provides other useful extensions, notably local labels using __label__ and label as values with computed gotos (very useful in threaded interpreters ...). I don't know exactly why they have not been standardized. I wish that they would.
main()
{
int a=10,b=30,c=0;
if( c =({a+b;b-a;}))
{
printf("%d",c);
}
}
Here,{a+b;b-a;} is one scope.In this you have written 2 statements.This is actually treated as
{
c=a+b;
c=b-a;
}
Initially c value is 40 because of a+b. Again c is modified by b-a. To prove this consider following three cases..
(1).
if(c=({(a=a+b);;}))
{
printf("%d\n",c);
printf("%d\n",a);
}
Here o/p is c=40 and a=40;Because at end of scope (i.e) in last statement is dummy (;).
so,c=a+b is o/p.
(2)
if(c=({(a=a+b);b-a;}))
{
printf("%d\n",c);
printf("%d\n",a);
}
Here o/p is c=-10, a=40. Because last statement is b-a. this value is assigned to c.
(3) main()
{
int a=10,b=30,c=0;
if(c=({(a=a+b);0;}))
{
printf("%d\n",c);
printf("%d\n",a);
}
printf("%d\n",c);
}
Here o/p is c=0 only.If is not executed ,Because of last statement is 0;
C follows procedure oriented.And associativity of () is left to right.

Resources