Prefix and postfix operators necessity - c

What is the necessity of both prefix and postfix increment operators? Is not one enough?
To the point, there exists like a similar while/do-while necessity problem, yet, there is not so much confusion (in understanding and usage) in having them both. But with having both prefix and postfix (like priority of these operators, their association, usage, working).
And do anyone been through a situation where you said "Hey, I am going to use postfix increment. Its useful here."

POSTFIX and PREFIX are not the same. POSTFIX increments/decrements only after the current statement/instruction is over. Whereas PREFIX increments/decrements and then executes the current step. Example, To run a loop n times,
while(n--)
{ }
works perfectly. But,
while(--n)
{
}
will run only n-1 times
Or for example:
x = n--; different then x = --n; (in second form value of x and n will be same). Off-course we can do same thing with binary operator - in multiple steps.
Point is suppose if there is only post -- then we have to write x = --n in two steps.
There can be other better reasons, But this is one I suppose a benefit to keep both prefix and postfix operator.

[edit to answer OP's first part]
Clearly i++ and ++i both affect i the same but return different values. The operations are different. Thus much code takes advantage of these differences.
The most obvious need to have both operators is the 40 year code base for C. Once a feature in a language is used extensively, very difficult to remove.
Certainly a new language could be defined with only one or none. But will it play in Peoria? We could get rid of the - operator too, just use a + -b, but I think it is a tough sell.
Need both?
The prefix operator is easy to mimic with alternate code for ++i is pretty much the same as i += 1. Other than operator precedence, which parens solves, I see no difference.
The postfix operator is cumbersome to mimic - as in this failed attempt if(i++) vs. if(i += 1).
If C of the future moved to depreciate one of these, I suspect it would be to depreciate the prefix operator for its functionality, as discussed above, is easier to replace.
Forward looking thought: the >> and << operators were appropriated in C++ to do something quite different from integer bit shifting. Maybe the ++pre and post++ will generate expanded meaning in another language.
[Original follows]
Answer to the trailing OP question "do anyone been through a situation where you saidd "Hey, I am going to use postfix increment. Its useful here"?
Various array processing, like with char[], benefit. Array indexing, starting at 0, lends itself to a postfix increment. For after fetching/setting the array element, the only thing to do with the index before the next array access is to increment the index. Might as well do so immediately.
With prefix increment, one may need to have one type of fetch for the 0th element and another type of fetch for the rest.
size_t j = 0;
for (size_t i = 0, (ch = inbuffer[i]) != '\0'; i++) {
if (condition(ch)) {
outbuffer[j++] = ch; // prefer this over below
}
}
outbuffer[j] = '\0';
vs.
for (size_t i = 0, (ch = inbuffer[i]) != '\0'; ++i) {
if (condition(ch)) {
outbuffer[j] = ch;
++j;
}
}
outbuffer[j] = '\0';

I think the only fair answer to which one to keep would be to do away with them both.
If, for example, you were to do away with postfix operators, then where code was once compactly expressed using n++, you would now have to refer to (++n - 1), or you would have to rearrange other terms.
If you broke the increment or decrement out onto its own line before or after the expression which referred to n, above, then it's not really relevant which you use, but in that case you could just as easily use neither, and replace that line with n = n + 1;
So perhaps the real issue, here, is expressions with side effects. If you like compact code then you'll see that both pre and post are necessary for different situations. Otherwise there doesn't seem to be much point in keeping either of them.
Example usage of each:
char array[10];
char * const end = array + sizeof(array) / sizeof(*array);
char *p = end;
int i = 0;
/* set array to { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } */
while (p > array)
*--p = i++;
p = array;
i = 0;
/* set array to { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } */
while (p < end)
*p++ = i++;

They are necessary because they are already used in lots of code, so if they were removed then lots of code would fail to compile.
As to why they ever existed in the first place, older compilers could generate more efficient code for ++i and i++ than they could for i+=1 and (i+=1)-1. For newer compilers this is generally not an issue.
The postfix version is something of an anomaly, as nowhere else in C is there an operator that modifies its operand but evaluates to the prior value of its operand.
One could certainly get by using only one or other of prefix or postfix increment operators. It would be a little more difficult to get by using only one or other of while or do while, as the difference between them is greater than the difference between prefix and postfix increment in my view.
And one could of course get by without using either prefix or postfix increment, or while or do while. But where do you draw the line between what's needless cruft and what's useful abstraction?

Here's a quickie example that uses both; an array-based stack, where the stack grows towards 0:
#define STACKSIZE ...
typedef ... T;
T stack[STACKSIZE];
size_t stackptr = STACKSIZE;
// push operation
if ( stackptr )
stack[ --stackptr ] = value;
// pop operation
if ( stackptr < STACKSIZE )
value = stack[ stackptr++ ];
Now we could accomplish the exact same thing without the ++ and -- operators, but it wouldn't scan as cleanly.

As for any other obscure mechanism in the C language, there are various historical reasons for it. In ancient times when dinosaurs walked the earth, compilers would make more efficient code out of i++ than i+=1. In some cases, compilers would generate less efficient code for i++ than for ++i, because i++ needed to save away the value to increment later. Unless you have a dinosaur compiler, none of this matters the slightest in terms of efficiency.
As for any other obscure mechanism in the C language, if it exists, people will start to use it. I'll use the common expression *p++ as an example (it means: p is a pointer, take the contents of p, use that as the result of the expression, then increment the pointer). It must use postfix and never prefix, or it would mean something completely different.
Some dinosaur once started writing needlessly complex expressions such as the *p++ and because they did, it has became common and today we regard such code as something trivial. Not because it is, but because we are so used at reading it.
But in modern programming, there is absolutely no reason to ever write *p++. For example, if we look at the implementation of the memcpy function, which has these prerequisites:
void* memcpy (void* restrict s1, const void* restrict s2, size_t n)
{
uint8_t* p1 = (uint8_t*)s1;
const uint8_t* p2 = (const uint8_t*)s2;
Then one popular way to implement the actual copying is:
while(n--)
{
*p1++ = *p2++;
}
Now some people will cheer, because we used so few lines of code. But few lines of code is not necessarily a measure of good code. Often it is the opposite: consider replacing it with a single line while(n--)*p1++=*p2++; and you see why this is true.
I don't think either case is very readable, you have to be a somewhat experienced C programmer to grasp it without scratching your head for five minutes. And you could write the same code like this:
while(n != 0)
{
*p1 = *p2;
p1++;
p2++;
n--;
}
Far clearer, and most importantly it yields exactly the same machine code as the first example.
And now see what happened: because we decided not to write obscure code with lots of operands in one expression, we might as well have used ++p1 and ++p2. It would give the same machine code. Prefix or postfix does not matter. But in the first example with obscure code, *++p1 = *++p2 would have completely changed the meaning.
To sum it up:
There exist prefix and postfix increment operators for historical reasons.
In modern programming, having two different such operators is completely superfluous, unless you write obscure code with several operators in the same expression.
If you write obscure code, will find ways to motivate the use of both prefix and postfix. However, all such code can always be rewritten.
You can use this as a quality measure of your code: if you ever find yourself writing code where it matters whether you are using prefix or postfix, you are writing bad code. Stop it, rewrite the code.

Prefix operator first increments value then its uses in the expression. Postfix operator,first uses the value in the expression and increments the value
The basic use of prefix/postfix operators are assembler replaces it with single increment/decrement instruction. If we use arithmetic operators instead of increment or decrement operators, assembler replaces it with two or three instructions. that's why we use increment/decrement operators.

You don't need both.
It is useful for implementing a stack, so it exists in some machine languages. From there it has been inherited indirectly to C (In which this redundancy is still somewhat useful, and some C programmers seems to like the idea of combining two unrelated operations in a single expression), and from C to any other C-like lagnuages.

Related

Can someone explain how this works? [duplicate]

I saw this code:
if (cond) {
perror("an error occurred"), exit(1);
}
Why would you do that? Why not just:
if (cond) {
perror("an error occurred");
exit(1);
}
In your example it serves no reason at all. It is on occasion useful when written as
if(cond)
perror("an error occured"), exit(1) ;
-- then you don't need curly braces. But it's an invitation to disaster.
The comma operator is to put two or more expressions in a position where the reference only allows one. In your case, there is no need to use it; in other cases, such as in a while loop, it may be useful:
while (a = b, c < d)
...
where the actual "evaluation" of the while loop is governed solely on the last expression.
Legitimate cases of the comma operator are rare, but they do exist. One example is when you want to have something happen inside of a conditional evaluation. For instance:
std::wstring example;
auto it = example.begin();
while (it = std::find(it, example.end(), L'\\'), it != example.end())
{
// Do something to each backslash in `example`
}
It can also be used in places where you can only place a single expression, but want two things to happen. For instance, the following loop increments x and decrements y in the for loop's third component:
int x = 0;
int y = some_number;
for(; x < y; ++x, --y)
{
// Do something which uses a converging x and y
}
Don't go looking for uses of it, but if it is appropriate, don't be afraid to use it, and don't be thrown for a loop if you see someone else using it. If you have two things which have no reason not to be separate statements, make them separate statements instead of using the comma operator.
The main use of the comma operator is obfuscation; it permits doing two
things where the reader only expects one. One of the most frequent
uses—adding side effects to a condition, falls under this
category. There are a few cases which might be considered valid,
however:
The one which was used to present it in K&R: incrementing two
variables in a for loop. In modern code, this might occur in a
function like std::transform, or std::copy, where an output iterator
is incremented symultaneously with the input iterator. (More often, of
course, these functions will contain a while loop, with the
incrementations in separate statements at the end of the loop. In such
cases, there's no point in using a comma rather than two statements.)
Another case which comes to mind is data validation of input parameters
in an initializer list:
MyClass::MyClass( T const& param )
: member( (validate( param ), param) )
{
}
(This assumes that validate( param ) will throw an exception if
something is wrong.) This use isn't particularly attractive, especially
as it needs the extra parentheses, but there aren't many alternatives.
Finally, I've sometimes seen the convention:
ScopedLock( myMutex ), protectedFunction();
, which avoids having to invent a name for the ScopedLock. To tell
the truth, I don't like it, but I have seen it used, and the alternative
of adding extra braces to ensure that the ScopedLock is immediately
destructed isn't very pretty either.
This can be better understood by taking some examples:
First:
Consider an expression:
x = ++j;
But for time being, if we need to assign a temporarily debug value, then we can write.
x = DEBUG_VALUE, ++j;
Second:
Comma , operators are frequently used in for() -loop e.g.:
for(i = 0, j = 10; i < N; j--, i++)
// ^ ^ here we can't use ;
Third:
One more example(actually one may find doing this interesting):
if (x = 16 / 4), if remainder is zero then print x = x - 1;
if (x = 16 / 5), if remainder is zero then print x = x + 1;
It can also be done in a single step;
if(x = n / d, n % d) // == x = n / d; if(n % d)
printf("Remainder not zero, x + 1 = %d", (x + 1));
else
printf("Remainder is zero, x - 1 = %d", (x - 1));
PS: It may also be interesting to know that sometimes it is disastrous to use , operator. For example in the question Strtok usage, code not working, by mistake, OP forgot to write name of the function and instead of writing tokens = strtok(NULL, ",'");, he wrote tokens = (NULL, ",'"); and he was not getting compilation error --but its a valid expression that tokens = ",'"; caused an infinite loop in his program.
The comma operator allows grouping expression where one is expected.
For example it can be useful in some case :
// In a loop
while ( a--, a < d ) ...
But in you case there is no reason to use it. It will be confusing... that's it...
In your case, it is just to avoid curly braces :
if(cond)
perror("an error occurred"), exit(1);
// =>
if (cond)
{
perror("an error occurred");
exit(1);
}
A link to a comma operator documentation.
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
Most of the oft usage of comma can be found out in the wikipedia article Comma_operator#Uses.
One interesting usage I have found out when using the boost::assign, where it had judiciously overloaded the operator to make it behave as a comma separated list of values which can be pushed to the end of a vector object
#include <boost/assign/std/vector.hpp> // for 'operator+=()'
using namespace std;
using namespace boost::assign; // bring 'operator+=()' into scope
{
vector<int> values;
values += 1,2,3,4,5,6,7,8,9; // insert values at the end of the container
}
Unfortunately, the above usage which was popular for prototyping would now look archaic once compilers start supporting Uniform Initialization
So that leaves us back to
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
In your case, the comma operator is useless since it could have been used to avoid curly braces, but it's not the case since the writer has already put them. Therefore it's useless and may be confusing.
It could be useful for the itinerary operator if you want to execute two or more instructions when the condition is true or false. but keep in mind that the return value will be the most right expression due to the comma operator left to right evalutaion rule (I mean inside the parentheses)
For instance:
a<b?(x=5,b=6,d=i):exit(1);
The boost::assign overloads the comma operator heavily to achieve this kind of syntax:
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;

+(+k--) expression in C

I saw this question in a test in which we have to tell the output of the following code.
#include<stdio.h>
int main(){
int k = 0;
while(+(+k--)!=0)
k=k++;
printf("%d\n", k);
return 0;
}
The output is -1. I am unsure why this is the answer, though.
What does the expression +(+k--) mean in C?
This code is deeply, perhaps deliberately, confusing. It contains a narrowly-averted instance of the dread undefined behavior. It's hard to know whether the person who constructed this question was being very, very clever or very, very stupid. And the "lesson" this code might purport to teach or quiz you about -- namely, that the unary plus operator doesn't do much -- is not one that's important enough, I would think, to deserve this kind of subversive misdirection.
There are two confusing aspects of the code, the strange condition:
while(+(+k--)!=0)
and the demented statement it controls:
k=k++;
I'm going to cover the second part first.
If you have a variable like k that you want to increment by 1, C gives you not one, not two, not three, but four different ways to do it:
k = k + 1
k += 1
++k
k++
Despite this bounty (or perhaps because of it), some programmers get confused and cough out contortions like
k = k++;
If you can't figure out what this is supposed to do, don't worry: no one can. This expression contains two different attempts to alter k's value (the k = part, and the k++ part), and because there's no rule in C to say which of the attempted modifications "wins", an expression like this is formally undefined, meaning not only that it has no defined meaning, but that the whole program containing it is suspect.
Now, if you look very carefully, you'll see that in this particular program, the line k = k++ doesn't actually get executed, because (as we're about to see) the controlling condition is initially false, so the loop runs 0 times. So this particular program might not actually be undefined -- but it's still pathologically confusing.
See also these canonical SO answers to all questions concerning Undefined Behavior of this sort.
But you didn't ask about the k=k++ part. You asked about the first confusing part, the +(+k--)!=0 condition. This looks strange, because it is strange. No one would ever write such code in a real program. So there's not much reason to learn how to understand it. (Yes, it's true, exploring the boundaries of a system can help you learn about its fine points, but there's a line in my book between imaginative, thought-provoking explorations versus dunderheaded, abusive explorations, and this expression is pretty clearly on the wrong side of that line.)
Anyway, let's examine +(+k--)!=0. (And after doing so, let's forget all about it.) Any expression like this has to be understood from the inside out. I presume you know what
k--
does. It takes k's current value and "returns" it to the rest of the expression, and it more or less simultaneously decrements k, that is, it stores the quantity k-1 back into k.
But then what does the + do? This is unary plus, not binary plus. It's just like unary minus. You know that binary minus does subtraction: the expression
a - b
subtracts b from a. And you know that unary minus negates things: the expression
-a
gives you the negative of a. What unary + does is... basically nothing. +a gives you a's value, after changing positive values to positive and negative values to negative. So the expression
+k--
gives you whatever k-- gave you, that is, k's old value.
But we're not done, because we have
+(+k--)
This just takes whatever +k-- gave you, and applies unary + to it again. So it gives you whatever +k-- gave you, which was whatever k-- gave you, which was k's old value.
So in the end, the condition
while(+(+k--)!=0)
does exactly the same thing as the much more ordinary condition
while(k-- != 0)
would have done. (It also does the same thing as the even more complicated-looking condition while(+(+(+(+k--)))!=0) would have done. And those parentheses aren't really necessary; it also does the same thing as while(+ +k--!=0) would have done.)
Even figuring out what the "normal" condition
while(k-- != 0)
does is kind of tricky. There are sort of two things going on in this loop: As the loop runs potentially multiple times, we're going to:
keep doing k--, to make k smaller and smaller, but also
keep doing the body of the loop, whatever that does.
But we do the k-- part right away, before (or in the process of) deciding whether to take another trip through the loop. And remember that k-- "returns" the old value of k, before decrementing it. In this program, the initial value of k is 0. So k-- is going to "return" the old value 0, then update k to -1. But then the rest of the condition is != 0 -- and it's not true that 0 != 0. That is, 0 is equal to 0, so we won't make any trips through the loop, so we won't try to execute the problematic statement k=k++ at all.
In other words, in this particular loop, although I said that "there are sort of two things going on", it turns out that thing 1 happens one time, but thing 2 happens zero times.
At any rate, I hope it's now adequately clear why this poor excuse for a program ends up printing -1 as the final value of k. Normally, I don't like to answer quiz questions like this -- it feels like cheating -- but in this case, since I fundamentally disagree with the whole point of the exercise, I don't mind.
At first glance it looks like this code invokes undefined behavior however that is not the case.
First let's format the code correctly:
#include<stdio.h>
int main(){
int k = 0;
while(+(+k--)!=0)
k=k++;
printf("%d\n", k);
return 0;
}
So now we can see that the statement k=k++; is inside of the loop.
Now let's trace the program:
When the loop condition is first evaluated, k has the value 0. The expression k-- has the current value of k, which is 0, and k is decremented as a side effect. So after this statement the value of k is -1.
The leading + on this expression has no effect on the value, so +k-- evaluated to 0 and similarly +(+k--) evaluates to 0.
Then the != operator is evaluated. Since 0!=0 is false, the body of the loop is not entered. Had the body been entered, you would invoke undefined behavior because k=k++ both reads and writes k without a sequence point. But the loop is not entered, so no UB.
Finally the value of k is printed which is -1.
Here's a version of this that shows operator precedence:
+(+(k--))
The two unary + operators don't do anything, so this expression is exactly equivalent to k--. The person that wrote this most likely was trying to mess with your mind.

Where can I find weird, specific C syntax rules?

I will take an exam and my teacher asks weird C syntax rules. Like:
int q=5;
for(q=-2;q=-5;q+=3) { //assignment in condition part??
printf("%d",q); //prints -5
break;
}
Or
int d[][3][2]={4,5,6,7,8,9,10,11,12,13,14,15,16};
int i=-1;
int j;
j=d[i++][++i][++i];
printf("%d",j); //prints 4?? why j=d[0][0][0] ?
Or
extern int a;
int main() {
do {
do {
printf("%o",a); //prints 12
} while(!1);
} while(0);
return 0;
}
int a=10;
I could not find it rules any site or book. Really absurd and uncommon. Where can I find?
To me it seems that your teacher is asking questions which invole undefined behavior.
If you tell him that this is incorrect, you're directly confronting him.
However, you could do the following:
Compile the code on different platforms
Compile the code with different compilers
Compile the code with different versions of the same compiler
Build a matrix with the results. You'll find out that they differ
Show the results to your teacher ans ask him to explain why that happens
That way you do not say that he's wrong, you're just showing some facts and you're showing that you're willing to learn and work.
Do that a long before the exam so that the teacher can look into it and think about his questions so that he can change the exam in time.
I could not find it rules any site or book. Where can I find?
See Where do I find the current C or C++ standard documents?. If you have a good library at university, they should own a copy.
Concerning for(q=-2;q=-5;q+=3) {, all you need to do is to break this down into its components. q=-2 is ran first, then q=-5 is tested, and if that is not 0 (which it isn't since it's an expression with value -5), then the loop body runs once. Then break forces a premature exit from an otherwise infinite loop. The expression then q+=3 is never reached.
The behaviour of d[i++][++i][++i] is undefined. Tell your teacher that, tactfully.
The "%o" format denotes octal output. a is set to 10 in decimal which is 12 in octal. Your code would be clearer if you had written:
int a=012; // octal constant.
The online version of the C language standard has what you need (and is what I will be referring to in this answer); just bear in mind is is a language definition and not a tutorial, and as such may not be easy to read for someone who doesn't have a lot of experience yet.
Having said that, your teacher is throwing you a few foul balls. For example:
j=d[i++][++i][++i];
This statement results in undefined behavior for several reasons. The first several paragraphs of section 6.5 of the document linked above explain the problem, but in a nutshell:
Except in a few situations, C does not guarantee left-to-right evaluation of expressions; neither does it guarantee that side effects are applied immediately after evaluation;
Attempting to modify the value of an object more than once between sequence points1, or modifying and then trying to use the value of an object without an intervening sequence point, results in undefined behavior.
Basically, don't write anything of the form:
x = x++;
x++ * x++;
a[i] = i++;
a[i++] = i;
C does not guarantee that each ++i and i++ is evaluated from left to right, and it does not guarantee that the side effect of each evaluation is applied immediately. So the result of j[i++][++i][++i] is not well-defined, and the result will not be consistent over different programs, or even different builds of the same program2.
AND, on top of that, i++ evaluates to the current value of i; so clearly, your teacher's intent was for j[i++][++i][++i] to evaluate to j[-1][1][2], which would also result in undefined behavior since you're attempting to index outside of the array bounds.
This is why I hate, hate, hate it when teachers throw this kind of code at their students - not only is it needlessly confusing, not only does it encourage bad practice, but more often than not it's just plain wrong.
As for the other questions:
for(q=-2;q=-5;q+=3) { //assignment in condition part??
See sections 6.5.16 and 6.8.5.3. In short, an assignment expression has a value (the value of the left operand after any type conversions), and it can appear as part of a controlling expression in a for loop. As long as the result of the assignment is non-zero (as in the case above), the loop will execute.
printf("%o",a); //prints 12
See section 7.21.6.1. The o conversion specifier tells printf to format the integer value as octal: 1010 == 128
A sequence point is a point in a programs execution where an expression has been fully evaluated and any side effects have been applied. Sequence points occur at the ends of statements, between the evaluation of a function's parameters and the function call, after evaluating the left operand of the &&, ||, and ?: operators, and a few other places. See Annex C for the complete list.
Or even different runs of the same build, although in practice you won't see values change from run to run unless you're doing something really hinky.

How to get and evaluate the Expressions from a string in C

How to get and evaluate the Expressions from a string in C
char *str = "2*8-5+6";
This should give the result as 17 after evaluation.
Try by yourself. you can Use stack data structure to evaluate this string here is reference to implement (its in c++)
stack data structre for string calcualtion
You have to do it yourself, C does not provide any way to do this. C is a very low level language. Simplest way to do it would be to find a library that does it, or if that does not exist use lex + yacc to create your own interpreter.
A quick google suggests the following:
http://www.gnu.org/software/libmatheval/
http://expreval.sourceforge.net/
You should try TinyExpr. It's a single C source code file (with no dependencies) that you can add to your project.
Using it to solve your problem is just:
#include <stdio.h>
#include "tinyexpr.h"
int main()
{
double result = te_interp("2*8-5+6", 0);
printf("Result: %f\n", result);
return 0;
}
That will print out: Result: 17
C does not have a standard eval() function.
There are lots of libraries and other tools out there that can do this.
But if you'd like to learn how to write an expression evaluator yourself, it can be surprisingly easy. It is not trivial: it is actually a pretty deeply theoretical problem, because you're basically writing a miniature parser, perhaps built on a miniature lexical analyzer, just like a real compiler.
One straightforward way of writing a parser involves a technique called recursive descent. Writing a recursive descent parser has a lot in common with another great technique for solving big or hard problems, namely by breaking the big, hard problem up into smaller and hopefully easier subproblems.
So let's see what we can come up with. We're going to write a function int eval(const char * expr) that takes a string containing an expression, and returns the int result of evaluating it. But first let's write a tiny main program to test it with. We'll read a line of text typed by the user using fgets, pass it to our expr() function, and print the result.
#include <stdio.h>
int eval(const char *expr);
int main()
{
char line[100];
while(1) {
printf("Expression? ");
if(fgets(line, sizeof line, stdin) == NULL) break;
printf(" -> %d\n", eval(line));
}
}
So now we start writing eval(). The first question is, how will we keep track of how how far we've read through the string as we parse it? A simple (although mildly cryptic) way of doing this will be to pass around a pointer to a pointer to the next character. That way any function can move forwards (or occasionally backwards) through the string. So our eval() function is going to do almost nothing, except take the address of the pointer to the string to be parsed, resulting in the char ** we just decided we need, and calling a function evalexpr() to do the work. (But don't worry, I'm not procrastinating; in just a second we'll start doing something interesting.)
int evalexpr(const char **);
int eval(const char *expr)
{
return evalexpr(&expr);
}
So now it's time to write evalexpr(), which is going to start doing some actual work. Its job is to do the first, top-level parse of the expression. It's going to look for a series of "terms" being added or subtracted. So it wants to get one or more subexpressions, with + or - operators between them. That is, it's going to handle expressions like
1 + 2
or
1 + 2 - 3
or
1 + 2 - 3 + 4
Or it can read a single expression like
1
Or any of the terms being added or subtracted can be a more-complicated subexpression, so it will also be able to (indirectly) handle things like
2*3 + 4*5 - 9/3
But the bottom line is that it wants to take an expression, then maybe a + or - followed by another subexpression, then maybe a + or - followed by another subexpression, and so on, as long as it keeps seeing a + or -. Here is the code. Since it's adding up the additive "terms" of the expression, it gets subexpressions by calling a function evalterm(). It also needs to look for the + and - operators, and it does this by calling a function gettok(). Sometimes it will see an operator other than + or -, but those are not its job to handle, so if it sees one of those it "ungets" it, and returns, because it's done. All of these functions pass the pointer-to-pointer p around, because as I said earlier, that's how all of these functions keep track of how they're moving through the string as they parse it.
int evalterm(const char **);
int gettok(const char **, int *);
void ungettok(int, const char **);
int evalexpr(const char **p)
{
int r = evalterm(p);
while(1) {
int op = gettok(p, NULL);
switch(op) {
case '+': r += evalterm(p); break;
case '-': r -= evalterm(p); break;
default: ungettok(op, p); return r;
}
}
}
This is some pretty dense code, Stare at it carefully and convince yourself that it's doing what I described. It calls evalterm() once, to get the first subexpression, and assigns the result to the local variable r. Then it enters a potentially infinite loop, because it can handle an arbitrary number of added or subtracted terms. Inside the loop, it gets the next operator in the expression, and uses it to decide what to do. (Don't worry about the second, NULL argument to gettok; we'll get to that in a minute.)
If it sees a +, it gets another subexpression (another term) and adds it to the running sum. Similarly, if it sees a -, it gets another term and subtracts it from the running sum. If it gets anything else, this means it's done, so it "ungets" the operator it doesn't want to deal with, and returns the running sum -- which is literally the value it has evaluated. (The return statement also breaks the "infinite" loop.)
At this point you're probably feeling a mixture of "Okay, this is starting to make sense" but also "Wait a minute, you're playing pretty fast and loose here, this is never going to work, is it?" But it is going to work, as we'll see.
The next function we need is the one that collects the "terms" or subexpressions to be added (and subtracted) together by evalexpr(). That function is evalterm(), and it ends up being very similar -- very similar -- to evalexpr. Its job is to collect a series of one or more subexpressions joined by * and/or /, and multiply and divide them. At this point, we're going to call those subexpressions "primaries". Here is the code:
int evalpri(const char **);
int evalterm(const char **p)
{
int r = evalpri(p);
while(1) {
int op = gettok(p, NULL);
switch(op) {
case '*': r *= evalpri(p); break;
case '/': r /= evalpri(p); break;
default: ungettok(op, p); return r;
}
}
}
There's actually nothing more to say here, because the structure of evalterm ends up being exactly like evalexpr, except that it does things with * and /, and it calls evalpri to get/evaluate its subexpressions.
So now let's look at evalpri. Its job is to evaluate the three lowest-level, but highest-precedence elements of an expression: actual numbers, and parenthesized subexpressions, and the unary - operator.
int evalpri(const char **p)
{
int v;
int op = gettok(p, &v);
switch(op) {
case '1': return v;
case '-': return -evalpri(p);
case '(':
v = evalexpr(p);
op = gettok(p, NULL);
if(op != ')') {
fprintf(stderr, "missing ')'\n");
ungettok(op, p);
}
return v;
}
}
The first thing to do is call the same gettok function we used in evalexpr and evalterm. But now it's time to say a little more about it. It is actually the lexical analyzer used by our little parser. A lexical analyzer returns primitive "tokens". Tokens are the basic syntactic elements of a programming language. Tokens can be single characters, like + or -, or they can also be multi-character entities. An integer constant like 123 is considered a single token. In C, other tokens are keywords like while, and identifiers like printf, and multi-character operators like <= and ++. (Our little expression evaluator doesn't have any of those, though.)
So gettok has to return two things. First it has to return a code for what kind of token it found. For single-character tokens like + and - we're going to say that the code is just the character. For numeric constants (that is, any numeric constant), we're going to say that gettok is going to return the character 1. But we're going to need some way of knowing what the value of the numeric constant was, and that, as you may have guessed, is what the second, pointer argument to the gettok function is for. When gettok returns 1 to indicate a numeric constant, and if the caller passes a pointer to an int value, gettok will fill in the integer value there. (We'll see the definition of the gettok function in a moment.)
At any rate, with that explanation of gettok out of the way, we can understand evalpri. It gets one token, passing the address of a local variable v in which the value of the token can be returned, if necessary. If the token is a 1 indicating an integer constant, we simply return the value of that integer constant. If the token is a -, this is a unary minus sign, so we get another subexpression, negate it, and return it. Finally, if the token is a (, we get another whole expression, and return its value, checking to make sure that there's another ) token after it. And, as you may notice, inside the parentheses we make a recursive call back to the top-level evalexpr function to get the subexpression, because obviously we want to allow any subexpression, even one containing lower-precedence operators like + and -, inside the parentheses.
And we're almost done. Next we can look at gettok. As I mentioned, gettok is the lexical analyzer, inspecting individual characters and constructing full tokens from them. We're now, finally, starting to see how the passed-around pointer-to-pointer p is used.
#include <stdlib.h>
#include <ctype.h>
void skipwhite(const char **);
int gettok(const char **p, int *vp)
{
skipwhite(p);
char c = **p;
if(isdigit(c)) {
char *p2;
int v = strtoul(*p, &p2, 0);
*p = p2;
if(vp) *vp = v;
return '1';
}
(*p)++;
return c;
}
Expressions can contain arbitrary whitespace, which is ignored, so we skip over that with an auxiliary function skipwhite. And now we look at the next character. p is a pointer to pointer to that character, so the character itself is **p. If it's a digit, we call strtoul to convert it. strtoul helpfully returns a pointer to the character following the number it scans, so we use that to update p. We fill in the passed pointer vp with the actual value strtoul computed for us, and we return the code 1 indicating an integer constant.
Otherwise -- if the next character isn't a digit -- it's an ordinary character, presumably an operator like + or - or punctuation like ( ), so we simply return the character, after incrementing *p to record the fact that we've consumed it. Properly "incrementing" p is mildly tricky: it's a pointer to a pointer, and we want to increment the pointed-to pointer. If we wrote p++ or *p++ it would increment the pointer p, so we need (*p)++ to say that it's the pointed-to pointer that we want to increment. (See also C FAQ 4.3.)
Two more utility functions, and then we're done. Here's skipwhite:
void skipwhite(const char **p)
{
while(isspace(**p))
(*p)++;
}
This simply skips over zero or more whitespace characters, as determined by the isspace function from <ctype.h>. (Again, we're taking care to remember that p is a pointer-to-pointer.)
Finally, we come to ungettok. It's a hallmark of a recursive descent parser (or, indeed, almost any parser) that it has to "look ahead" in the input, making a decision based on the next token. Sometimes, however, it decides that it's not ready to deal with the next token after all, so it wants to leave it there on the input for some other part of the parser to deal with later.
Stuffing input "back on the input stream", so to speak, can be tricky. This implementation of ungettok is simple, but it's decidedly imperfect:
void ungettok(int op, const char **p)
{
(*p)--;
}
It doesn't even look at the token it's been asked to put back; it just backs the pointer up by 1. This will work if (but only if) the token it's being asked to unget is in fact the most recent token that was gotten, and if it's not an integer constant token. In fact, for the program as written, and as long as the expression it's parsing is well-formed, this will always be the case. It would be possible to write a more complicated version of gettok that explicitly checked for these assumptions, and that would be able to back up over multi-character tokens (such as integer constants) if necessary, but this post has gotten much longer than I had intended, so I'm not going to worry about that for now.
But if you're still with me, we're done! And if you haven't already, I encourage you to copy all the code I've presented into your friendly neighborhood C compiler, and try it out. You'll find, for example, that 1 + 2 * 3 gives 7 (not 9), because the parser "knows" that * and / have higher precedence than + and -. Just like in a real compiler, you can override the default precedence using parentheses: (1 + 2) * 3 gives 9. Left-to-right associativity works, too: 1 - 2 - 3 is -4, not +2. It handles plenty of complicated, and perhaps surprising (but legal) cases, too: (((((5))))) evaluates to just 5, and ----4 evaluates to just 4 (it's parsed as "negative negative negative negative four", since our simplified parser doesn't have C's -- operator).
This parser does have a pretty big limitation, however: its error handling is terrible. It will handle legal expressions, but for illegal expressions, it will either do something bizarre, or just ignore the problem. For example, it simply ignores any trailing garbage it doesn't recognize or wasn't expecting -- the expressions 4 + 5 x, 4 + 5 %, and 4 + 5 ) all evaluate to 9.
Despite being somewhat of a "toy", this is also a very real parser, and as we've seen it can parse a lot of real expressions. You can learn a lot about how expressions are parsed (and about how compilers can be written) by studying this code. (One footnote: recursive descent is not the only way to write a parser, and in fact real compilers will usually use considerably more sophisticated techniques.)
You might even want to try extending this code, to handle other operators or other "primaries" (such as settable variables). Once upon a time, in fact, I started with something like this and extended it all the way into an actual C interpreter.

proper usage of the pre-increment operator in combination with the pointer dereference operator

I just wrote the following line of code:
if (++(data_ptr->count) > threshold) { /*...*/ } // example 1
My intent is to increment the count variable within the data structure that data_ptr points to before making the comparison to threshold, and this line of code works.
If I had instead wanted to increment data_ptr before making the comparison, I would have written this:
if ((++data_ptr)->count > threshold) { /*...*/ } // example 2
Out of curiosity, I also tried this line of code:
if (++data_ptr->count > threshold) { /*...*/ } // example 3
And found that it behaves exactly the same as the first one.
First question: Why does example #3 work the same as example #1? Is it a matter of operator precendence? Something in the standard? I had to write a quick test program becuase the answer was not obvious to me.
Second question: Should I write this if statement differently? I could perform the increment first, on its own line, and then test the condition in order to avoid any possible confusion. Is this necessary, or are the first two examples obvious enough on their own?
1) Precedence
2) Preference
Yes, this is a question of operator precedence. The arrow operator has higher precedence than the increment operator.
To answer your second question, I would generally separate this type of construct out into two separate statements:
++(data_ptr->count);
if(data_ptr->count > threshold) { /*...*/ }
This emphasizes what is going on and eliminates possible confusion. However, this is probably a question of preference.
Operator precedence dictates the behaviour you observed.
It wouldn't hurt much if you separated the increment from the comparison in this example, but sometimes you want to have a condition with increment in the middle of a sequence of conditions, and then trying to separate the increment from the test can hurt the readability of the code by introducing nesting that would otherwise be unnecessary.
For example:
if (...1...)
...2...
else if (++data_ptr->count > threshold)
...3...
else
...4...
Versus:
if (...1...)
...2...
else
{
++data_ptr->count;
if (data_ptr->count > threshold)
...3...
else
...4...
}
The -> operator binds tighter than ++.
They're both well-defined and I think they're easy to read. Things would only get dodgy if you had additional references to data_ptr in the same expression.

Resources