What is the need for the conditional operator? Functionally it is redundant, since it implements an if-else construct. If the conditional operator is more efficient than the equivalent if-else assignment, why can't if-else be interpreted more efficiently by the compiler?
In C, the real utility of it is that it's an expression instead of a statement; that is, you can have it on the right-hand side (RHS) of a statement. So you can write certain things more concisely.
Some of the other answers given are great. But I am surprised that no one mentioned that it can be used to help enforce const correctness in a compact way.
Something like this:
const int n = (x != 0) ? 10 : 20;
so basically n is a const whose initial value is dependent on a condition statement. The easiest alternative is to make n not a const, this would allow an ordinary if to initialize it. But if you want it to be const, it cannot be done with an ordinary if. The best substitute you could make would be to use a helper function like this:
int f(int x) {
if(x != 0) { return 10; } else { return 20; }
}
const int n = f(x);
but the ternary if version is far more compact and arguably more readable.
The ternary operator is a syntactic and readability convenience, not a performance shortcut. People are split on the merits of it for conditionals of varying complexity, but for short conditions, it can be useful to have a one-line expression.
Moreover, since it's an expression, as Charlie Martin wrote, that means it can appear on the right-hand side of a statement in C. This is valuable for being concise.
It's crucial for code obfuscation, like this:
Look-> See?!
No
:(
Oh, well
);
Compactness and the ability to inline an if-then-else construct into an expression.
There are a lot of things in C that aren't technically needed because they can be more or less easily implemented in terms of other things. Here is an incomplete list:
while
for
functions
structs
Imagine what your code would look like without these and you may find your answer. The ternary operator is a form of "syntactic sugar" that if used with care and skill makes writing and understanding code easier.
Sometimes the ternary operator is the best way to get the job done. In particular when you want the result of the ternary to be an l-value.
This is not a good example, but I'm drawing a blank on somethign better. One thing is certian, it is not often when you really need to use the ternary, although I still use it quite a bit.
const char* appTitle = amDebugging ? "DEBUG App 1.0" : "App v 1.0";
One thing I would warn against though is stringing ternaries together. They become a real
problem at maintennance time:
int myVal = aIsTrue ? aVal : bIsTrue ? bVal : cIsTrue ? cVal : dVal;
EDIT: Here's a potentially better example. You can use the ternary operator to assign references & const values where you would otherwise need to write a function to handle it:
int getMyValue()
{
if( myCondition )
return 42;
else
return 314;
}
const int myValue = getMyValue();
...could become:
const int myValue = myCondition ? 42 : 314;
Which is better is a debatable question that I will choose not to debate.
Since no one has mentioned this yet, about the only way to get smart printf statements is to use the ternary operator:
printf("%d item%s", count, count > 1 ? "s\n" : "\n");
Caveat: There are some differences in operator precedence when you move from C to C++ and may be surprised by the subtle bug(s) that arise thereof.
The fact that the ternary operator is an expression, not a statement, allows it to be used in macro expansions for function-like macros that are used as part of an expression. Const may not have been part of original C, but the macro pre-processor goes way back.
One place where I've seen it used is in an array package that used macros for bound-checked array accesses. The syntax for a checked reference was something like aref(arrayname, type, index), where arrayname was actually a pointer to a struct that included the array bounds and an unsigned char array for the data, type was the actual type of the data, and index was the index. The expansion of this was quite hairy (and I'm not going to do it from memory), but it used some ternary operators to do the bound checking.
You can't do this as a function call in C because of the need for polymorphism of the returned object. So a macro was needed to do the type casting in the expression.
In C++ you could do this as a templated overloaded function call (probably for operator[]), but C doesn't have such features.
Edit: Here's the example I was talking about, from the Berkeley CAD array package (glu 1.4 edition). The documentation of the array_fetch usage is:
type
array_fetch(type, array, position)
typeof type;
array_t *array;
int position;
Fetch an element from an array. A
runtime error occurs on an attempt to
reference outside the bounds of the
array. There is no type-checking
that the value at the given position
is actually of the type used when
dereferencing the array.
and here is the macro defintion of array_fetch (note the use of the ternary operator and the comma sequencing operator to execute all the subexpressions with the right values in the right order as part of a single expression):
#define array_fetch(type, a, i) \
(array_global_index = (i), \
(array_global_index >= (a)->num) ? array_abort((a),1) : 0,\
*((type *) ((a)->space + array_global_index * (a)->obj_size)))
The expansion for array_insert ( which grows the array if necessary, like a C++ vector) is even hairier, involving multiple nested ternary operators.
It's syntatic sugar and a handy shorthand for brief if/else blocks that only contain one statement. Functionally, both constructs should perform identically.
like dwn said, Performance was one of its benefits during the rise of complex processors, MSDN blog Non-classical processor behavior: How doing something can be faster than not doing it gives an example which clearly says the difference between ternary (conditional) operator and if/else statement.
give the following code:
#include <windows.h>
#include <stdlib.h>
#include <stdlib.h>
#include <stdio.h>
int array[10000];
int countthem(int boundary)
{
int count = 0;
for (int i = 0; i < 10000; i++) {
if (array[i] < boundary) count++;
}
return count;
}
int __cdecl wmain(int, wchar_t **)
{
for (int i = 0; i < 10000; i++) array[i] = rand() % 10;
for (int boundary = 0; boundary <= 10; boundary++) {
LARGE_INTEGER liStart, liEnd;
QueryPerformanceCounter(&liStart);
int count = 0;
for (int iterations = 0; iterations < 100; iterations++) {
count += countthem(boundary);
}
QueryPerformanceCounter(&liEnd);
printf("count=%7d, time = %I64d\n",
count, liEnd.QuadPart - liStart.QuadPart);
}
return 0;
}
the cost for different boundary are much different and wierd (see the original material). while if change:
if (array[i] < boundary) count++;
to
count += (array[i] < boundary) ? 1 : 0;
The execution time is now independent of the boundary value, since:
the optimizer was able to remove the branch from the ternary expression.
but on my desktop intel i5 cpu/windows 10/vs2015, my test result is quite different with msdn blog.
when using debug mode, if/else cost:
count= 0, time = 6434
count= 100000, time = 7652
count= 200800, time = 10124
count= 300200, time = 12820
count= 403100, time = 15566
count= 497400, time = 16911
count= 602900, time = 15999
count= 700700, time = 12997
count= 797500, time = 11465
count= 902500, time = 7619
count=1000000, time = 6429
and ternary operator cost:
count= 0, time = 7045
count= 100000, time = 10194
count= 200800, time = 12080
count= 300200, time = 15007
count= 403100, time = 18519
count= 497400, time = 20957
count= 602900, time = 17851
count= 700700, time = 14593
count= 797500, time = 12390
count= 902500, time = 9283
count=1000000, time = 7020
when using release mode, if/else cost:
count= 0, time = 7
count= 100000, time = 9
count= 200800, time = 9
count= 300200, time = 9
count= 403100, time = 9
count= 497400, time = 8
count= 602900, time = 7
count= 700700, time = 7
count= 797500, time = 10
count= 902500, time = 7
count=1000000, time = 7
and ternary operator cost:
count= 0, time = 16
count= 100000, time = 17
count= 200800, time = 18
count= 300200, time = 16
count= 403100, time = 22
count= 497400, time = 16
count= 602900, time = 16
count= 700700, time = 15
count= 797500, time = 15
count= 902500, time = 16
count=1000000, time = 16
the ternary operator is slower than if/else statement on my machine!
so according to different compiler optimization techniques, ternal operator and if/else may behaves much different.
Some of the more obscure operators in C exist solely because they allow implementation of various function-like macros as a single expression that returns a result. I would say that this is the main purpose why the ?: and , operators are allowed to exist, even though their functionality is otherwise redundant.
Lets say we wish to implement a function-like macro that returns the largest of two parameters. It would then be called as for example:
int x = LARGEST(1,2);
The only way to implement this as a function-like macro would be
#define LARGEST(x,y) ((x) > (y) ? (x) : (y))
It wouldn't be possible with an if ... else statement, since it does not return a result value. Note)
The other purpose of ?: is that it in some cases actually increases readability. Most often if...else is more readable, but not always. Take for example long, repetitive switch statements:
switch(something)
{
case A:
if(x == A)
{
array[i] = x;
}
else
{
array[i] = y;
}
break;
case B:
if(x == B)
{
array[i] = x;
}
else
{
array[i] = y;
}
break;
...
}
This can be replaced with the far more readable
switch(something)
{
case A: array[i] = (x == A) ? x : y; break;
case B: array[i] = (x == B) ? x : y; break;
...
}
Please note that ?: does never result in faster code than if-else. That's some strange myth created by confused beginners. In case of optimized code, ?: gives identical performance as if-else in the vast majority of the cases.
If anything, ?: can be slower than if-else, because it comes with mandatory implicit type promotions, even of the operand which is not going to be used. But ?: can never be faster than if-else.
Note) Now of course someone will argue and wonder why not use a function. Indeed if you can use a function, it is always preferable over a function-like macro. But sometimes you can't use functions. Suppose for example that x in the example above is declared at file scope. The initializer must then be a constant expression, so it cannot contain a function call. Other practical examples of where you have to use function-like macros involve type safe programming with _Generic or "X macros".
ternary = simple form of if-else. It is available mostly for readability.
The same as
if(0)
do();
if(0)
{
do();
}
Related
I saw this code:
if (cond) {
perror("an error occurred"), exit(1);
}
Why would you do that? Why not just:
if (cond) {
perror("an error occurred");
exit(1);
}
In your example it serves no reason at all. It is on occasion useful when written as
if(cond)
perror("an error occured"), exit(1) ;
-- then you don't need curly braces. But it's an invitation to disaster.
The comma operator is to put two or more expressions in a position where the reference only allows one. In your case, there is no need to use it; in other cases, such as in a while loop, it may be useful:
while (a = b, c < d)
...
where the actual "evaluation" of the while loop is governed solely on the last expression.
Legitimate cases of the comma operator are rare, but they do exist. One example is when you want to have something happen inside of a conditional evaluation. For instance:
std::wstring example;
auto it = example.begin();
while (it = std::find(it, example.end(), L'\\'), it != example.end())
{
// Do something to each backslash in `example`
}
It can also be used in places where you can only place a single expression, but want two things to happen. For instance, the following loop increments x and decrements y in the for loop's third component:
int x = 0;
int y = some_number;
for(; x < y; ++x, --y)
{
// Do something which uses a converging x and y
}
Don't go looking for uses of it, but if it is appropriate, don't be afraid to use it, and don't be thrown for a loop if you see someone else using it. If you have two things which have no reason not to be separate statements, make them separate statements instead of using the comma operator.
The main use of the comma operator is obfuscation; it permits doing two
things where the reader only expects one. One of the most frequent
uses—adding side effects to a condition, falls under this
category. There are a few cases which might be considered valid,
however:
The one which was used to present it in K&R: incrementing two
variables in a for loop. In modern code, this might occur in a
function like std::transform, or std::copy, where an output iterator
is incremented symultaneously with the input iterator. (More often, of
course, these functions will contain a while loop, with the
incrementations in separate statements at the end of the loop. In such
cases, there's no point in using a comma rather than two statements.)
Another case which comes to mind is data validation of input parameters
in an initializer list:
MyClass::MyClass( T const& param )
: member( (validate( param ), param) )
{
}
(This assumes that validate( param ) will throw an exception if
something is wrong.) This use isn't particularly attractive, especially
as it needs the extra parentheses, but there aren't many alternatives.
Finally, I've sometimes seen the convention:
ScopedLock( myMutex ), protectedFunction();
, which avoids having to invent a name for the ScopedLock. To tell
the truth, I don't like it, but I have seen it used, and the alternative
of adding extra braces to ensure that the ScopedLock is immediately
destructed isn't very pretty either.
This can be better understood by taking some examples:
First:
Consider an expression:
x = ++j;
But for time being, if we need to assign a temporarily debug value, then we can write.
x = DEBUG_VALUE, ++j;
Second:
Comma , operators are frequently used in for() -loop e.g.:
for(i = 0, j = 10; i < N; j--, i++)
// ^ ^ here we can't use ;
Third:
One more example(actually one may find doing this interesting):
if (x = 16 / 4), if remainder is zero then print x = x - 1;
if (x = 16 / 5), if remainder is zero then print x = x + 1;
It can also be done in a single step;
if(x = n / d, n % d) // == x = n / d; if(n % d)
printf("Remainder not zero, x + 1 = %d", (x + 1));
else
printf("Remainder is zero, x - 1 = %d", (x - 1));
PS: It may also be interesting to know that sometimes it is disastrous to use , operator. For example in the question Strtok usage, code not working, by mistake, OP forgot to write name of the function and instead of writing tokens = strtok(NULL, ",'");, he wrote tokens = (NULL, ",'"); and he was not getting compilation error --but its a valid expression that tokens = ",'"; caused an infinite loop in his program.
The comma operator allows grouping expression where one is expected.
For example it can be useful in some case :
// In a loop
while ( a--, a < d ) ...
But in you case there is no reason to use it. It will be confusing... that's it...
In your case, it is just to avoid curly braces :
if(cond)
perror("an error occurred"), exit(1);
// =>
if (cond)
{
perror("an error occurred");
exit(1);
}
A link to a comma operator documentation.
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
Most of the oft usage of comma can be found out in the wikipedia article Comma_operator#Uses.
One interesting usage I have found out when using the boost::assign, where it had judiciously overloaded the operator to make it behave as a comma separated list of values which can be pushed to the end of a vector object
#include <boost/assign/std/vector.hpp> // for 'operator+=()'
using namespace std;
using namespace boost::assign; // bring 'operator+=()' into scope
{
vector<int> values;
values += 1,2,3,4,5,6,7,8,9; // insert values at the end of the container
}
Unfortunately, the above usage which was popular for prototyping would now look archaic once compilers start supporting Uniform Initialization
So that leaves us back to
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
In your case, the comma operator is useless since it could have been used to avoid curly braces, but it's not the case since the writer has already put them. Therefore it's useless and may be confusing.
It could be useful for the itinerary operator if you want to execute two or more instructions when the condition is true or false. but keep in mind that the return value will be the most right expression due to the comma operator left to right evalutaion rule (I mean inside the parentheses)
For instance:
a<b?(x=5,b=6,d=i):exit(1);
The boost::assign overloads the comma operator heavily to achieve this kind of syntax:
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;
#include<stdio.h>
int hailstone(int n){
int length=1;
while(1<n){(n%2)?n=3*n+1:n/=2;length++;}//error
//while(1<n){(n%2)?n=3*n+1:(n/=2);length++;} right
return length;
}
int main()
{
int n;
scanf("%d",&n);
printf("%d\n",hailstone(n));
}
There is an error in line four,the gcc compiler said lvalue required as left operand of assignment,but if i add brackets it will be right.like line five,and i don not know why.
Despite shoving all the logic into one line, the problem is actually in one expression (pardon me for adding spaces):
(n%2) ? n=3*n+1 : n/=2
The precedence rules for C expressions are like that. Assignment (and that includes compound assignment) binds less tightly than ?:. So the compiler has to interpret what you wrote as:
((n%2) ? n=3*n+1 : n) /=2
Since ?: doesn't produce an lvalue (something that can appear on the left of an assignment), you get an error. Your use of parentheses forces the precedence to match what you wanted.
But an even better way to write that is to not be "clever" and think there is some "elegance" in using as much tokens as possible in a single expression. Here's another version, which is far more readable, and easier to verify as correct:
if (n%2) {
n = 3*n+1;
} else {
n /= 2;
}
And it's no less efficient than using a conditional expression.
Here the compiler takes l value as
((n%2) ? n=3*n+1 : n) /
and r value as
2
Since ((n%2) ? n=3*n+1 : n) / = 2 is an invalid statement and l value is not a variable, compiler throws error. You can provide the precedence of execution to the compiler by adding brackets.
(If this is a duplicate please point me to an answer)
I have two scenarios where a loop is checking a complex expression over and over again (a complex expression would consist of math operations and retrieving data):
for (int i = 0; i < expression; i++) {
// stuff
}
for (int i = 0; i < someNumber; i++) {
if (i == expression) break;
}
I'm wondering if it's more efficient to pre-calculate the expression and check against a known value like so
int known = expression;
for (int i = 0; i < known; i++) {
// stuff
}
for (int i = 0; i < someNumber; i++) {
if (i == known) break;
}
or if it's done by the compiler automatically.
For reference, I'm running the loop ~700 000 000 times and the expression is something like structure->arr[j] % n or sqrt(a * n + b)
Is it even worth it?
If the compiler is able to detect that calculating expression will give the same result every time, it will only do the calculation once.
The tricky part is: "If the compiler is able to ...."
Compilers are very smart and will probably be successful in most cases. But why take the chance?
Just write that extra line to do the calculation before the loop as you did in your second example.
By doing that you send a clear message to the compiler about expression being constant within the loops. Further it may also help your co-workers to easier understand the code.
That said... you yourself must be sure that expression is in fact the same every time. Let's look at your example:
the expression is something like structure->arr[i] % n or sqrt(a * n + b)
Now the first one, i.e. structure->arr[i] % n depends on the loop variable i so it will be a big mistake to move the code outside the loop.
The second (i.e. sqrt(a * n + b)) looks better provided that a n b doesn't change inside the loop.
I don't quite get the following part of 5.1.2.3/3:
An actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced (including any caused by
calling a function or accessing a volatile object).
Suppose I have the following code:
char data[size];
int i;
int found;
/* initialize data to some values in here */
found = 0;
for( i = 0; i < size; i++ ) {
if( data[i] == 0 ) {
found = 1;
/* no break in here */
}
}
/* i no longer used, do something with "found" here */
Note that found starts with 0 and can either remain unchanged or turn into 1. It cannot turn into 1 and then into something else. So the following code would yield the same result (except for i value which is not used after the loop anyway):
char data[size];
int i;
int found;
/* initialize data to some values in here */
found = 0;
for( i = 0; i < size; i++ ) {
if( data[i] == 0 ) {
found = 1;
break;
}
}
/* i no longer used, do something with "found" here */
Now what does the Standard say about need not evaluate part of an expression with regard to found = 1 and the loop control expressions which follow the first iteration in which control gets inside if?
Clearly if found is used somewhere after this code the compiler must emit the code that traverses the array and conditionally evaluates found = 1 expression.
Is the implementation required to evaluate found = 1 once for every zero found in the array or can it instead evaluate it no more that once and so effectively emit the code for the second snippet when compiling the first snippet?
can it instead evaluate it no more that once and so effectively emit the code for the second snippet when compiling the first snippet?
Yes, a compiler has the right to perform that optimization. It seems like a pretty aggressive optimization but it would be legal.
It might be interesting to look at an example that more closely matches the spirit of the text:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Suppose we have:
int x = pureFunction(y) * otherPureFunction(z);
Suppose the compiler knows that both functions are int-returning "pure" functions; that is, they have no side effects and their result depends solely on the arguments. Suppose the compiler also believes that otherPureFunction is an extremely expensive operation. A compiler could choose to implement the code as though you had written:
int temp = pureFunction(y);
int x = temp == 0 ? 0 : temp * otherPureFunction(z);
That is, determine that under some conditions it is unnecessary to compute otherPureFunction() because the result of the multiplication is already known once the left operand is known to be zero. No needed side effects will be elided because there are no side effects.
Yes, it may perform this optimization, since there are no I/O operations, reads from volatile locations or externally visible writes to memory omitted by the optimized code, so the behavior is preserved.
As an example of this kind of optimization, GCC will compile
void noop(const char *s)
{
for (size_t i = 0; i < strlen(s); i++) {
}
}
to a completely empty function:
noop:
.LFB33:
.cfi_startproc
rep ret
.cfi_endproc
It is allowed to do so because the Standard guarantees the behavior of strlen, the compiler knows that it has no externally visible effect on s or any other piece of memory, and it can deduce that the whole function has no behavior. (Amazingly, this simple optimization brings the complexity down from quadratic to constant.)
For years, I've gotten in to the habit of not using the value of a for loop iterator after the loop exits. I could have sworn that I did this, because it used to produce a compiler warning, but after I was challenged in a recent code review, I was proven wrong.
For example, I always did this (NOTE: our code standards prohibit the use of the "break" keyword):
int i, result;
bool done = false;
for (i=0; i<10 && !done; i++) {
if (some_condition) {
result = i;
done = true;
}
}
// Value of i may be undefined here
Now, obviously the result variable could be removed, if I can rely on the value of i. I thought that because of compiler optimization, you could not rely on the value of the loop iterator. Am I just remembering a phantom teaching? Or is this the standard (specifically regarding GNU C)?
There is nothing wrong in C89, C99, or C11 to access the iteration variable after the for statement.
int i;
for (i = 0; i < 10; i++) {
/* Some code */
}
printf("%d\n", i); // No magic, the value is 10
From C99, you can use also a declaration as the first clause of the for statement, and in that case of course the declared variable cannot be used after the for statement.
Different languages have different rules. In Pascal, the compiler is allowed to optimize away storing the loop index after the final increment, so it might be the first loop-terminating value or it might be the last valid value.
There are plenty of usage cases where the for loop is used for nothing else but advancing the iterator. This can be seen in some implementations of strlen (though admittedly there are other ways to do strlen), and other sorts of functions whose goal it is to find a certain limit:
/*find the index of the first element which is odd*/
for (ii = 0; ii < nelem && arry[ii] % 2 == 0; ii++);
As mentioned, the point of confusion may come from constructs where the iterator itself is defined within the for statement.
In general for statements are very very powerful, and it's unfortunate that they're usually never utilized to their full potential.
For example, a different version of the same loop can be written as follows (though it wouldn't demonstrate the safety of using the iterator):
#include <stdio.h>
int main(void)
{
int cur, ii = 0, nelem, arry [] = { 1, 2, 4, 6, 8, 8, 3, 42, 45, 67 };
int sum = 0;
nelem = sizeof(arry) / sizeof(int);
/* Look mom! no curly braces! */
for (
ii = 0;
ii < nelem && ((cur = arry[ii]) %2 == 0 ||
((printf("Found odd number: %d\n", cur)||1)));
ii++, sum += cur
);
printf("Sum of all numbers is %d\n", sum);
return 0;
}
In this particular case, it seems like a lot of work for this specific problem, but it can be very handy for some things.
Even though the value of that for loop's control variable is well defined, you might have been told to avoid using the for loop's control variable after the for loop because of the way scoping of that variable is handled, and especially because the handling has changed of the history of C++ (I know this question is tagged "C", but I think the rationale for avoiding using for loop control variable after the loop may have origins in this C++ history).
For example, consider the following code:
int more_work_to_do(void)
{
return 1;
}
int some_condition(void)
{
return 1;
}
int foo()
{
int i = 100;
while (more_work_to_do()) {
int done = 0;
for (int i = 0; i < 10 && !done; i++) {
if (some_condition()) {
done = 1;
}
}
if (done) return i; // which `i`?
}
return 1;
}
Under some old rules of scoping for the i declared in the for loop, the value returned on the statement marked with the comment "which i" would be determined by the for loop (VC++ 6 uses these rules). Under the newer, standard rules for scoping that variable, the value returned will be the i declared at the start of the function.
While I can't possibly know how your habit came to be, I can tell you how my habit to do the same did. It was by seeing code like this:
for (i=0u; (i<someLimit) && (found != TRUE); i++)
{
if (someCondition) found = TRUE;
}
foundIndex = i-1;
Basically, code like this is written when the break keyword is disallowed by some coding rules, e.g. based on MISRA. If you don't break out of the loop though, the loop will usually leave you with an "i" which is off by one from what you care for.
Sometimes, you can even find this:
for (i=0u; (i<someLimit) && (found != TRUE); i++)
{
if (someCondition) found = TRUE;
}
foundIndex = i;
This is just semantically wrong and can be found when the "forbid break keyword rule" is introduced into an existing code base which is not sufficiently covered by unit tests. May sound surprising, but it's all out there...