I have used flex and bison in order to make a lexical analyzer and a parser for an EBNF grammar. This work is done! I mean, when i put a file with a program I write, I can see if the program has mistakes. If it doesn't, I can see the whole program in my screen based on the grammar i have used. I have no problem in this.
Now, I want to use loop handling and loop unrolling. Which part should I change? The lexical analyzer? The parser? Or the main after the parser? And how?
Introduction
As we don't have sight of a piece of your code to see how you are handling a loop in the parser and outputting code, and an example of a specific loop that you might want unrolled it is difficult to give any more detailed advice than that already given. There are unlikely to be any more experienced compiler writers or teachers anywhere on the globe than those already reading your question! So we will need to explore other ways to explain how to solve a problem like this.
It often happens that people can't post examples of their code because they started with a significant code base provided as part of a class exercise or from an open source repository, and they do not fully understand how it works to be able to find appropriate code fragments to post. Let's imagine that you had the complete source of a working compiler for a real language and wanted to add some loop optimisations to that existing, working compiler, you might then say, as you did, "what source, how can I show some source?" (because in actuality it is many tens of thousands of lines of code).
An Example Compiler
In the absence of some code to reference the alternative is to create one, as an exemplar, to explain the problem and solution. This is often how it is done in compiler text books or compiler classes. I will use a similar simple example to demonstrate how such optimisations can be achieved using the tools flex and bison.
First, we need to define the language of the example. To keep within the reasonable size constraints of a SO answer the language must be very simple. I will use simple assignments of expressions as the only statement form in my language. The variables in this language will be single letters and the constants will be positive integers. The only expression operator is plus (+). An example program in my language might be:
i = j + k; j = 1 + 2
The output code generated by the compiler will be simple assembler for a single accumulator machine with four instructions, LDA, STO, ADD and STP. The code generated for the above statements would be:
LDA j
ADD k
STO i
LDA #1
ADD #2
STO j
STP
Where LDA loads a value or variable into the accumulator, ADD adds a variable or value to the accumulator, STO stores the accumulator back to a variable. STP is "stop" for the end-of-program.
The flex program
The language shown above will need the tokens for ID and NUMBER and should also skip whitespace. The following will suffice:
%{
#define yyterminate() return (END);
%}
digit [0-9]
id [a-z]
ws [\t\n\r ]
%%
{ws}+ /* Skip whitespace */
{digit}+ {yylval = (int)(0l - atol(yytext)); return(NUMBER); }
{id} {yylval = yytext[0]; return(ID); }
"+" {return('+'); }
"=" {return('='); }
Gory details
Just some notes on how this works. I've used atol to convert the integer to allow for deal with potential integer overflow that can occur in reading MAXINT. I'm negating the constants so they can be easily distinguished from the identifiers which will be positive in one byte. I'm storing single character identifiers to avoid having the burden of illustrating symbol table code and thus permit a very small lexer, parser and code generator.
The bison program
To parse the language and generate some code from the bison actions we can achieve this by the following bison program:
%{
#include <stdio.h>
%}
%token NUMBER ID END
%%
program : statements END { printf("STP\n"); return(0) ; }
;
statements : statement
| statements ';' statement
;
statement : ID '=' expression { printf("STO %c\n",$1); }
|
;
expression : operand {
/* Load operand into accumulator */
if ($1 <= 0)
printf("LDA #%d\n",(int)0l-$1);
else printf("LDA %c\n",$1);
}
| expression '+' operand {
/* Add operand to accumulator */
if ($3 <= 0)
printf("ADD #%d\n",(int)0l-$3);
else printf("ADD %c\n",$3);
}
;
operand : NUMBER
| ID
;
%%
#include "lex.yy.c"
Explanation of methodology
This paragraph is intended for those who know how to do this and might query the approach used in my examples. I've deliberately avoided building a tree and doing a tree walk, although this would be the orthodox technique for code generation and optimisation. I wanted to avoid adding all the necessary code overhead in the example to manage the tree and walk it. This way my example compiler can be really tiny. However, being restricted to only using bison action to perform the code generation limits me to the ordering of the bison rule matching. This meant that only pseudo-machine code could really be generated. A source-to-source example would be less tractable with this methodology. I've chosen an idealised machine that is a cross between MU0 and a register-less PDP/11, again with the bare minimum of features to demonstrate some optimisations of code.
Optimisation
Now we have a working compiler for a language in a few lines of code we can start to demonstrate how the process of adding code optimisation might work.
As has already been said by the esteemed #Chris Dodd:
If you want to do program transformations after parsing, you should do them after parsing. You can do them incrementally (calling transform routines from your bison code after parsing part of your input), or after parsing is complete, but either way, they happen after parsing the part of the program you are transforming.
This compiler works by emitting code incrementally after parsing part of the input. As each statement is recognised the bison action (within the {...} clause) is invoked to generate code. If this is to be transformed into more optimal code it is this code that has to be changed to generate the desired optimisation. To be able to achieve effective optimisation we need a clear understanding of what language features are to be optimised and what the optimal transformation should be.
Constant Folding
A common optimisation (or code transformation) that can be done in a compiler is constant folding. In constant folding the compiler replaces expressions made entirely of numbers by the result. For example consider the following:
i = 1 + 2
An optimisation would be to treat this as:
i = 3
Thus the addition of 1 + 2 was made by the compiler and not put into the generated code to occur at run time. We would expect the following output to result:
LDA #3
STO i
Improved Code Generator
We can implement the improved code by looking for the explicit case where we have a NUMBER on both sides of expression '+' operand. To do this we have to delay taking any action on expression : operand to permit the value to be propagated onwards. As the value for an expression might not have been evaluated we have to potentially do that on assignment and addition, which makes for a slight explosion of if statements. We only need to change the actions for the rules statement and expression however, which are as shown below:
statement : ID '=' expression {
/* Check for constant expression */
if ($3 <= 0) printf("LDA #%d\n",(int)0l-$3);
else
/* Check if expression in accumulator */
if ($3 != 'A') printf("LDA %c\n",$3);
/* Now store accumulator */
printf("STO %c\n",$1);
}
| /* empty statement */
;
expression : operand { $$ = $1 ; }
| expression '+' operand {
/* First check for constant expression */
if ( ($1 <= 0) && ($3 <= 0)) $$ = $1 + $3 ;
else { /* No constant folding */
/* See if $1 already in accumulator */
if ($1 != 'A')
/* Load operand $1 into accumulator */
if ($1 <= 0)
printf("LDA #%d\n",(int)0l-$1);
else printf("LDA %c\n",$1);
/* Add operand $3 to accumulator */
if ($3 <= 0)
printf("ADD #%d\n",(int)0l-$3);
else printf("ADD %c\n",$3);
$$ = 'A'; /* Note accumulator result */
}
}
;
If you build the resultant compiler, you will see that it does indeed generate better code and perform the constant folding transformation.
Loop Unrolling
The transformation that you specifically asked about in your question was that of loop unrolling. In loop unrolling the compiler will look for some specific integer expression values in the loop start and end conditions to determine if the unrolled code transformation should be performed. The compiler can will then generate two possible code alternative sequences for loops, the unrolled and standard looping code. We can demonstrate this concept in this example mini-compiler by using integer increments.
If we imagine that the machine code has an INC instruction which increments the accumulator by one and is faster that performing an ADD #1 instruction, we can further improve the compiler by looking for that specific case. This involves evaluating integer constant expressions and comparing to a specific value to decide if an alternative code sequence should be used - just as in loop unrolling. For example:
i = j + 1
should result in:
LDA j
INC
STO i
Final Code Generator
To change the code generated for n + 1 we only need to recode part of the expression semantics and just test that when not folding constants wether the constant to be used would be 1 (which is negated in this example). The resultant code becomes:
expression : operand { $$ = $1 ; }
| expression '+' operand {
/* First check for constant expression */
if ( ($1 <= 0) && ($3 <= 0)) $$ = $1 + $3 ;
else { /* No constant folding */
/* Check for special case of constant 1 on LHS */
if ($1 == -1) {
/* Swap LHS/RHS to permit INC usage */
$1 = $3;
$3 = -1;
}
/* See if $1 already in accumulator */
if ($1 != 'A')
/* Load operand $1 into accumulator */
if ($1 <= 0)
printf("LDA #%d\n",(int)0l-$1);
else printf("LDA %c\n",$1);
/* Add operand $3 to accumulator */
if ($3 <= 0)
/* test if ADD or INC */
if ($3 == -1) printf("INC\n");
else printf("ADD #%d\n",(int)0l-$3);
else printf("ADD %c\n",$3);
$$ = 'A'; /* Note accumulator result */
}
}
;
Summary
In this mini-tutorial we have defined a whole language, a complete machine code, written a lexer, a compiler, a code generator and an optimiser. It has briefly demonstrated the process of code generation and indicated (albeit generally) how code transformation and optimisation could be performed. It should enable similar improvements to be made in other (as yet unseen) compilers, and has addressed the issue of identifying loop unrolling conditions and generating specific improvements for that case.
It should also have made it clear, how difficult it is to answer questions without specific examples of some program code to refer to.
Related
I saw this code:
if (cond) {
perror("an error occurred"), exit(1);
}
Why would you do that? Why not just:
if (cond) {
perror("an error occurred");
exit(1);
}
In your example it serves no reason at all. It is on occasion useful when written as
if(cond)
perror("an error occured"), exit(1) ;
-- then you don't need curly braces. But it's an invitation to disaster.
The comma operator is to put two or more expressions in a position where the reference only allows one. In your case, there is no need to use it; in other cases, such as in a while loop, it may be useful:
while (a = b, c < d)
...
where the actual "evaluation" of the while loop is governed solely on the last expression.
Legitimate cases of the comma operator are rare, but they do exist. One example is when you want to have something happen inside of a conditional evaluation. For instance:
std::wstring example;
auto it = example.begin();
while (it = std::find(it, example.end(), L'\\'), it != example.end())
{
// Do something to each backslash in `example`
}
It can also be used in places where you can only place a single expression, but want two things to happen. For instance, the following loop increments x and decrements y in the for loop's third component:
int x = 0;
int y = some_number;
for(; x < y; ++x, --y)
{
// Do something which uses a converging x and y
}
Don't go looking for uses of it, but if it is appropriate, don't be afraid to use it, and don't be thrown for a loop if you see someone else using it. If you have two things which have no reason not to be separate statements, make them separate statements instead of using the comma operator.
The main use of the comma operator is obfuscation; it permits doing two
things where the reader only expects one. One of the most frequent
uses—adding side effects to a condition, falls under this
category. There are a few cases which might be considered valid,
however:
The one which was used to present it in K&R: incrementing two
variables in a for loop. In modern code, this might occur in a
function like std::transform, or std::copy, where an output iterator
is incremented symultaneously with the input iterator. (More often, of
course, these functions will contain a while loop, with the
incrementations in separate statements at the end of the loop. In such
cases, there's no point in using a comma rather than two statements.)
Another case which comes to mind is data validation of input parameters
in an initializer list:
MyClass::MyClass( T const& param )
: member( (validate( param ), param) )
{
}
(This assumes that validate( param ) will throw an exception if
something is wrong.) This use isn't particularly attractive, especially
as it needs the extra parentheses, but there aren't many alternatives.
Finally, I've sometimes seen the convention:
ScopedLock( myMutex ), protectedFunction();
, which avoids having to invent a name for the ScopedLock. To tell
the truth, I don't like it, but I have seen it used, and the alternative
of adding extra braces to ensure that the ScopedLock is immediately
destructed isn't very pretty either.
This can be better understood by taking some examples:
First:
Consider an expression:
x = ++j;
But for time being, if we need to assign a temporarily debug value, then we can write.
x = DEBUG_VALUE, ++j;
Second:
Comma , operators are frequently used in for() -loop e.g.:
for(i = 0, j = 10; i < N; j--, i++)
// ^ ^ here we can't use ;
Third:
One more example(actually one may find doing this interesting):
if (x = 16 / 4), if remainder is zero then print x = x - 1;
if (x = 16 / 5), if remainder is zero then print x = x + 1;
It can also be done in a single step;
if(x = n / d, n % d) // == x = n / d; if(n % d)
printf("Remainder not zero, x + 1 = %d", (x + 1));
else
printf("Remainder is zero, x - 1 = %d", (x - 1));
PS: It may also be interesting to know that sometimes it is disastrous to use , operator. For example in the question Strtok usage, code not working, by mistake, OP forgot to write name of the function and instead of writing tokens = strtok(NULL, ",'");, he wrote tokens = (NULL, ",'"); and he was not getting compilation error --but its a valid expression that tokens = ",'"; caused an infinite loop in his program.
The comma operator allows grouping expression where one is expected.
For example it can be useful in some case :
// In a loop
while ( a--, a < d ) ...
But in you case there is no reason to use it. It will be confusing... that's it...
In your case, it is just to avoid curly braces :
if(cond)
perror("an error occurred"), exit(1);
// =>
if (cond)
{
perror("an error occurred");
exit(1);
}
A link to a comma operator documentation.
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
Most of the oft usage of comma can be found out in the wikipedia article Comma_operator#Uses.
One interesting usage I have found out when using the boost::assign, where it had judiciously overloaded the operator to make it behave as a comma separated list of values which can be pushed to the end of a vector object
#include <boost/assign/std/vector.hpp> // for 'operator+=()'
using namespace std;
using namespace boost::assign; // bring 'operator+=()' into scope
{
vector<int> values;
values += 1,2,3,4,5,6,7,8,9; // insert values at the end of the container
}
Unfortunately, the above usage which was popular for prototyping would now look archaic once compilers start supporting Uniform Initialization
So that leaves us back to
There appear to be few practical uses of operator,().
Bjarne Stroustrup, The Design and Evolution of C++
In your case, the comma operator is useless since it could have been used to avoid curly braces, but it's not the case since the writer has already put them. Therefore it's useless and may be confusing.
It could be useful for the itinerary operator if you want to execute two or more instructions when the condition is true or false. but keep in mind that the return value will be the most right expression due to the comma operator left to right evalutaion rule (I mean inside the parentheses)
For instance:
a<b?(x=5,b=6,d=i):exit(1);
The boost::assign overloads the comma operator heavily to achieve this kind of syntax:
vector<int> v;
v += 1,2,3,4,5,6,7,8,9;
I have written a simple grammar:
operations :
/* empty */
| operations operation ';'
| operations operation_id ';'
;
operation :
NUM operator NUM
{
printf("%d\n%d\n",$1, $3);
}
;
operation_id :
WORD operator WORD
{
printf("%s\n%s\n%s\n",$1, $3, $<string>2);
}
;
operator :
'+' | '-' | '*' | '/'
{
$<string>$ = strdup(yytext);
}
;
As you can see, I have defined an operator that recognizes one of 4 symbols. Now, I want to print this symbol in operation_id. Problem is, that logic in operator works only for last symbol in alternative.
So if I write a/b; it prints ab/ and that's cool. But for other operations, eg. a+b; it prints aba. What am I doing wrong?
*I ommited new lines symbols in example output.
This non-terminal from your grammar is just plain wrong.
operator :
'+' | '-' | '*' | '/' { $<string>$ = strdup(yytext); }
;
First, in yacc/bison, each production has an action. That rule has four productions, of which only the last has an associated action. It would be clearer to write it like this:
operator : '+'
| '-'
| '*'
| '/' { $<string>$ = strdup(yytext); }
;
which makes it a bit more obvious that the action only applies to the reduction from the token '/'.
The action itself is incorrect as well. yytext should never be used outside of a lexer action, because its value isn't reliable; it will be the value at the time the most recent lexer action was taken, but since the parser usually (but not always) reads one token ahead, it will usually (but not always) be the string associated with the next token. That's why the usual advice is to make a copy of yytext, but the idea is to copy it in the lexer rule, assigning the copy to the appropriate member of yylval so that the parser can use the semantic value of the token.
You should avoid the use of $<type>$ =. A non-terminal can only have one type, and it should be declared in the prologue to the bison file:
%type <string> operator
Finally, you will find that it is very rarely useful to have a non-terminal which recognizes different operators, because the different operators are syntactically different. In a more complete expression grammar, you'd need to distinguish between a + b * c, which is the sum of a and the product of b and c, and a * b + c, which is the sum of c and the product of a and b. That can be done by using different non-terminals for the sum and product syntaxes, or by using different productions for an expression non-terminal and disambiguating with precedence rules, but in both cases you will not be able to use an operator non-terminal which produces + and * indiscriminately.
For what its worth, here is the explanation of why a+b results in the output of aba:
The production operator : '+' has no explicit action, so it ends up using the default action, which is $$ = $1.
However, the lexer rule which returns '+' (presumably -- I'm guessing here) never sets yylval. So yylval still has the value it was last assigned.
Presumably (another guess), the lexer rule which produces WORD correctly sets yylval.string = strdup(yytext);. So the semantic value of the '+' token is the semantic value of the previous WORD token, which is to say a pointer to the string "a".
So when the rule
operation_id :
WORD operator WORD
{
printf("%s\n%s\n%s\n",$1, $3, $<string>2);
}
;
executes, $1 and $2 both have the value "a" (two pointers to the same string), and $3 has the value "b".
Clearly, it is semantically incorrect for $2 to have the value "a", but there is another error waiting to occur. As written, your parser leaks memory because you never free() any of the strings created by strdup. That's not very satisfactory, and at some point you will want to fix the actions so that semantic values are freed when they are no longer required. At that point, you will discover that having two semantic values pointing at the same block of allocated memory makes it highly likely that free() will be called twice on the same memory block, which is Undefined Behaviour (and likely to produce very difficult-to-diagnose bugs).
I would like to evaluate formulas which a user can input for many data points, so efficiency is a concern. This is for a Fortran project, but my solutions so far have been centered on using a yacc/bison grammar, so I will probably use Fortran's iso_c_binding feature to interface to yyparse().
The preferred (so far) solution would be an small extension of the classic mfcalc calculator example from the Bison manual, with the bison grammar made to recognize a (single) variable name as well (which is not hard).
The question is what to do in the executable statements. I see two options there.
First, I could simply evaluate the expression as it is parsed, as in the mfcalc example.
Second, I could invoke the bison parser once for parsing and for creating a stack-based (reverse polish) representation of the formula being parsed, so
2 + 3*x would be translated into 2 3 * + (of course, as the relevant data structure).
The relevant part of the grammar would look like this:
%union {
double val;
char *c;
int fcn;
}
%type <val> NUMBER
%type <c> VAR
%type <fcn> Function
/* Tokens and %left PLUS MINUS etc. left out for brevity */
%%
...
Function:
SIN { $$=SIN; }
| COS { $$=COS; }
| TAN { $$=TAN; }
| SQRT { $$=SQRT; }
Expression:
NUMBER { push_number($1); }
| VAR { push_var($1); }
| Expression PLUS Expression { push_operand(PLUS); }
| Expression MINUS Expression { push_operand(MINUS); }
| Expression DIVIDE Expression { push_operand(DIVIDE); }
| MINUS Expression %prec NEG { push_operand(NEG); }
| LEFT_PARENTHESIS Expression RIGHT_PARENTHESIS;
| Function LEFT_PARENTHESIS Expression RIGHT_PARENTHESIS { push_function($1); }
| Expression POWER Expression { push_operand(POWER); }
The functions push_... would put the formula into an array of structs, which which contain a struct holding the token and the yacc union.
The RPN would then be interpreted using a very simple (and hopefully fast) interpreter.
So, the questions.
Is the second approach valid? I think it is from what I understand about bison (or yacc's) way of handling shift and reduce (basically, this will shift a number and reduce an expression, so the order should be guaranteed to be correct for RPN), but I am not quite sure.
Also, is it worth the additional effort over simply evaluating the function using the $$ construct (the first approach)?
Finally, are there other, better solutions? I had considered using syntax trees, but I don't think the additional effort is actually worth it. Also, I tend to think that using trees is overkill where an array would do just nicely :-)
It's only slightly more difficult to generate three-address virtual ops than RPN. In effect, the RPN is a virtual stack machine. The three-address ops -- which can also easily go into an array -- are probably faster to interpret, and will probably be more flexible in the long term.
The main advantage of parsing the expression into some internal form is that it is likely to be faster to evaluate the internal form than to reparse the original string. That may not be the case, but it usually is because converting floating-point literals into floating-point numbers is (relatively speaking) quite slow.
There is also the intermediate case of tokenizing the expression (into an array), and then directly evaluating while parsing the token stream. (In effect, that makes bison your virtual machine.)
Which of these strategies is the best depends a lot on details of your use case, but none of them are difficult so you could try all three and compare.
What is the necessity of both prefix and postfix increment operators? Is not one enough?
To the point, there exists like a similar while/do-while necessity problem, yet, there is not so much confusion (in understanding and usage) in having them both. But with having both prefix and postfix (like priority of these operators, their association, usage, working).
And do anyone been through a situation where you said "Hey, I am going to use postfix increment. Its useful here."
POSTFIX and PREFIX are not the same. POSTFIX increments/decrements only after the current statement/instruction is over. Whereas PREFIX increments/decrements and then executes the current step. Example, To run a loop n times,
while(n--)
{ }
works perfectly. But,
while(--n)
{
}
will run only n-1 times
Or for example:
x = n--; different then x = --n; (in second form value of x and n will be same). Off-course we can do same thing with binary operator - in multiple steps.
Point is suppose if there is only post -- then we have to write x = --n in two steps.
There can be other better reasons, But this is one I suppose a benefit to keep both prefix and postfix operator.
[edit to answer OP's first part]
Clearly i++ and ++i both affect i the same but return different values. The operations are different. Thus much code takes advantage of these differences.
The most obvious need to have both operators is the 40 year code base for C. Once a feature in a language is used extensively, very difficult to remove.
Certainly a new language could be defined with only one or none. But will it play in Peoria? We could get rid of the - operator too, just use a + -b, but I think it is a tough sell.
Need both?
The prefix operator is easy to mimic with alternate code for ++i is pretty much the same as i += 1. Other than operator precedence, which parens solves, I see no difference.
The postfix operator is cumbersome to mimic - as in this failed attempt if(i++) vs. if(i += 1).
If C of the future moved to depreciate one of these, I suspect it would be to depreciate the prefix operator for its functionality, as discussed above, is easier to replace.
Forward looking thought: the >> and << operators were appropriated in C++ to do something quite different from integer bit shifting. Maybe the ++pre and post++ will generate expanded meaning in another language.
[Original follows]
Answer to the trailing OP question "do anyone been through a situation where you saidd "Hey, I am going to use postfix increment. Its useful here"?
Various array processing, like with char[], benefit. Array indexing, starting at 0, lends itself to a postfix increment. For after fetching/setting the array element, the only thing to do with the index before the next array access is to increment the index. Might as well do so immediately.
With prefix increment, one may need to have one type of fetch for the 0th element and another type of fetch for the rest.
size_t j = 0;
for (size_t i = 0, (ch = inbuffer[i]) != '\0'; i++) {
if (condition(ch)) {
outbuffer[j++] = ch; // prefer this over below
}
}
outbuffer[j] = '\0';
vs.
for (size_t i = 0, (ch = inbuffer[i]) != '\0'; ++i) {
if (condition(ch)) {
outbuffer[j] = ch;
++j;
}
}
outbuffer[j] = '\0';
I think the only fair answer to which one to keep would be to do away with them both.
If, for example, you were to do away with postfix operators, then where code was once compactly expressed using n++, you would now have to refer to (++n - 1), or you would have to rearrange other terms.
If you broke the increment or decrement out onto its own line before or after the expression which referred to n, above, then it's not really relevant which you use, but in that case you could just as easily use neither, and replace that line with n = n + 1;
So perhaps the real issue, here, is expressions with side effects. If you like compact code then you'll see that both pre and post are necessary for different situations. Otherwise there doesn't seem to be much point in keeping either of them.
Example usage of each:
char array[10];
char * const end = array + sizeof(array) / sizeof(*array);
char *p = end;
int i = 0;
/* set array to { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } */
while (p > array)
*--p = i++;
p = array;
i = 0;
/* set array to { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } */
while (p < end)
*p++ = i++;
They are necessary because they are already used in lots of code, so if they were removed then lots of code would fail to compile.
As to why they ever existed in the first place, older compilers could generate more efficient code for ++i and i++ than they could for i+=1 and (i+=1)-1. For newer compilers this is generally not an issue.
The postfix version is something of an anomaly, as nowhere else in C is there an operator that modifies its operand but evaluates to the prior value of its operand.
One could certainly get by using only one or other of prefix or postfix increment operators. It would be a little more difficult to get by using only one or other of while or do while, as the difference between them is greater than the difference between prefix and postfix increment in my view.
And one could of course get by without using either prefix or postfix increment, or while or do while. But where do you draw the line between what's needless cruft and what's useful abstraction?
Here's a quickie example that uses both; an array-based stack, where the stack grows towards 0:
#define STACKSIZE ...
typedef ... T;
T stack[STACKSIZE];
size_t stackptr = STACKSIZE;
// push operation
if ( stackptr )
stack[ --stackptr ] = value;
// pop operation
if ( stackptr < STACKSIZE )
value = stack[ stackptr++ ];
Now we could accomplish the exact same thing without the ++ and -- operators, but it wouldn't scan as cleanly.
As for any other obscure mechanism in the C language, there are various historical reasons for it. In ancient times when dinosaurs walked the earth, compilers would make more efficient code out of i++ than i+=1. In some cases, compilers would generate less efficient code for i++ than for ++i, because i++ needed to save away the value to increment later. Unless you have a dinosaur compiler, none of this matters the slightest in terms of efficiency.
As for any other obscure mechanism in the C language, if it exists, people will start to use it. I'll use the common expression *p++ as an example (it means: p is a pointer, take the contents of p, use that as the result of the expression, then increment the pointer). It must use postfix and never prefix, or it would mean something completely different.
Some dinosaur once started writing needlessly complex expressions such as the *p++ and because they did, it has became common and today we regard such code as something trivial. Not because it is, but because we are so used at reading it.
But in modern programming, there is absolutely no reason to ever write *p++. For example, if we look at the implementation of the memcpy function, which has these prerequisites:
void* memcpy (void* restrict s1, const void* restrict s2, size_t n)
{
uint8_t* p1 = (uint8_t*)s1;
const uint8_t* p2 = (const uint8_t*)s2;
Then one popular way to implement the actual copying is:
while(n--)
{
*p1++ = *p2++;
}
Now some people will cheer, because we used so few lines of code. But few lines of code is not necessarily a measure of good code. Often it is the opposite: consider replacing it with a single line while(n--)*p1++=*p2++; and you see why this is true.
I don't think either case is very readable, you have to be a somewhat experienced C programmer to grasp it without scratching your head for five minutes. And you could write the same code like this:
while(n != 0)
{
*p1 = *p2;
p1++;
p2++;
n--;
}
Far clearer, and most importantly it yields exactly the same machine code as the first example.
And now see what happened: because we decided not to write obscure code with lots of operands in one expression, we might as well have used ++p1 and ++p2. It would give the same machine code. Prefix or postfix does not matter. But in the first example with obscure code, *++p1 = *++p2 would have completely changed the meaning.
To sum it up:
There exist prefix and postfix increment operators for historical reasons.
In modern programming, having two different such operators is completely superfluous, unless you write obscure code with several operators in the same expression.
If you write obscure code, will find ways to motivate the use of both prefix and postfix. However, all such code can always be rewritten.
You can use this as a quality measure of your code: if you ever find yourself writing code where it matters whether you are using prefix or postfix, you are writing bad code. Stop it, rewrite the code.
Prefix operator first increments value then its uses in the expression. Postfix operator,first uses the value in the expression and increments the value
The basic use of prefix/postfix operators are assembler replaces it with single increment/decrement instruction. If we use arithmetic operators instead of increment or decrement operators, assembler replaces it with two or three instructions. that's why we use increment/decrement operators.
You don't need both.
It is useful for implementing a stack, so it exists in some machine languages. From there it has been inherited indirectly to C (In which this redundancy is still somewhat useful, and some C programmers seems to like the idea of combining two unrelated operations in a single expression), and from C to any other C-like lagnuages.
What is the meaning of == and how does it differ from =?
How do I know which one to use?
== is a test for equality. = is an assignment.
Any good C book should cover this (fairly early on in the book I would imagine).
For example:
int i = 3; // sets i to 3.
if (i == 3) printf("i is 3\n"); // prints it.
Just watch out for the heinous:
if (i = 4) { }
which is valid C and frequently catches people out. This actually assigns 4 to the variable i and uses that as the truth value in the if statement. This leads a lot of people to use the uglier but safer:
if (4 == i) {}
which, if you accidentally use = instead of ==, is a compile-time error rather than something that will bite you on the backside while your program is running :-)
The logical-or operator is two vertical bar characters, one after the other, not a single character. Here it is lined up with a logical-and, and a variable called b4:
||
&&
b4
No magic there.
a == b is a test if a and b are equal.
a = b is called an assignment, which means to set the variable a to having the same value as b.
(You type | with Shift-\ in the US keyboard layout.)
== tests equality
= assigns a value
neither are related to ||
I might add that in Finnish and Swedish keyboards. Pipe symbol; |; of OR is AltGr (the right alt) and < key. IF you are using Mac on the other hand it is Alt-7 key.
Gave me a lot of sweat when I first started typing on these keyboards.
Now that you know the difference between '==' and '=", let me put you some words of caution. Although '==' is used as a standard test of equality between comparable variables and '=' used as an internally type-casted assignment, the following programming error is quiet common.
In the below example and similar codes, '=' is know as "Always true" conditional operator.
#include<stdio.h>
int main()
{
int i = 10, j = 20;
if ( i = j )
printf("Equal\n");
else
printf("NOT Equal\n");
return 0;
}
So, the word of caution is "Never use '=' in if statements, unless you have something evil in your mind."