I have to parse an infix expression into a binary tree.
The expression is:
(((x1 + 5.12) ∗ (x2 − 7.68))/x3)
I don't really have a clue on how to interpret the expression. Does someone has a clue on how to process this?
Your task is not so hard, firstly you should acquaint yourself with notation types and then with expression parsing.
In general, to parse and evaluate an (infix) expression, you need to:
read and tokenize it, i.e. classify each symbol as: operand, operation, etc.
convert from infix to binary expression tree: this is usually done with algorithms such as Shunting yard algorithm.
create a grammar that defines operation precedence and allows strict1 order of expression evaluation.
Expressions written in infix notation are slightly more difficult to parse, that is why usually they are converted to more "machine friendly" versions, like (reverse) Polish notation which provides some advantages among which is the elimination of the need of parenthesses.
So, as you can see this is roughly the big picture and your task is a part of it. Here is a visualisation of binary expression tree for: 2 * 3 / ( 2 – 1 ) + 5 * ( 4 – 1 )
Here is more on the topic and an example implementation in C++.
1. In your case obeying the rules of Algebra.
Related
From main(), I want the user to input a mathematical function (I,e: 2xy) through the command line. From there, I initially thought to iterate through the string and parse out different arithmetic operators, x, y, etc. However, this could become fairly complicated for more intricate functions, (e.g: (2x^2)/5 +sqrt(x^4) ). Is there a more general method to be able to parse a mathematical function string like this one?
One of the most helpful ways to deal with parsing issues like that is to switch the input methods from equations like that to an RPN based input where the arguments come first and the operators come last.
Rewriting your complex equation would end up looking like:
2 2 x ^ * 5 / x 4 ^ sqrt +
This is generally easier to implement, as you can do it with a simple stack -- pushing new arguments on, while the operators pull the require pieces off the stack and put the result back on. Greatly simplifies the parsing, but you still need to implement the functions.
What you need is an expression evaluator.
A while ago, I wrote a complete C expression evaluator (i.e. evaluated expressions written using C syntax) for a command line processor and scripting language on an embedded system. I used this description of the algorithm as a starting point. You could use the accompanying code directly, but I did not like the implementation, and wrote my own from the algorithm description.
It needed some work to support all C operators, function calls, and variables, but is a clear explanation and therefore a good starting point, especially if you don't need that level of completeness.
The basic principle is that expression evaluation is easier for a computer using a stack and 'Reverse Polish Notation', so the algorithm converts an in-fix notation expression with associated order of precedence and parentheses to RPN, and then evaluates it by popping operands, performing operations, and pushing results, until there are no operations left and one value left on the stack.
It might get a bit more complicated is you choose to deal with implicit multiply operators (2xy rather then 2 * x * y for example. Not least because you'd need to unambiguously distinguish the variables x and y from a single variable xy. That is probably only feasible if you only allow single character variable names. I suggest you either do that and insert explicit multiply operators on the operator stack as part of the parse, or you disallow implicit multiply.
In the "Introduction" section of K&R C (2E) there is this paragraph:
C, like any other language, has its blemishes. Some of the operators have the wrong precedence; ...
Which operators are these? How are their precedence wrong?
Is this one of these cases?
Yes, the situation discussed in the message you link to is the primary gripe with the precedence of operators in C.
Historically, C developed without &&. To perform a logical AND operation, people would use the bitwise AND, so a==b AND c==d would be expressed with a==b & c==d. To facilitate this, == had higher precedence than &. Although && was added to the language later, & was stuck with its precedence below ==.
In general, people might like to write expressions such as (x&y) == 1 much more often than x & (y==1). So it would be nicer if & had higher precedence than ==. Hence people are dissatisfied with this aspect of C operator precedence.
This applies generally to &, ^, and | having lower precedence than ==, !=, <, >, <=, and >=.
There is a clear rule of precedence that is incontrovertible.
The rule is so clear that for a strongly typed system (think Pascal) the wrong precedence would give clear unambiguous syntax errors at compile time. The problem with C is that since its type system is laissez faire the errors turn out to be more logical errors resulting in bugs rather than errors catch-able at compile time.
The Rule
Let ○ □ be two operators with type
○ : α × α → β
□ : β × β → γ
and α and γ are distinct types.
Then
x ○ y □ z can only mean (x ○ y) □ z, with type assignment
x: α, y : α, z : β
whereas x ○ (y □ z) would be a type error because ○ can only take an α whereas the right sub-expression can only produce a γ which is not α
Now lets
Apply this to C
For the most part C gets it right
(==) : number × number → boolean
(&&) : boolean × boolean → boolean
so && should be below == and it is so
Likewise
(+) : number × number → number
(==) : number × number → boolean
and so (+) must be above (==) which is once again correct
However in the case of bitwise operators
the &/| of two bit-patterns aka numbers produce a number
ie
(&), (|) : number × number → number
(==) : number × number → boolean
And so a typical mask query eg. x & 0x777 == 0x777
can only make sense if (&) is treated as an arithmetic operator ie above (==)
C puts it below which in light of the above type rules is wrong
Of course Ive expressed the above in terms of math/type-inference
In more pragmatic C terms x & 0x777 == 0x777 naturally groups as
x & (0x777 == 0x777) (in the absence of explicit parenthesis)
When can such a grouping have a legitimate use?
I (personally) dont believe there is any
IOW Dennis Ritchie's informal statement that these precedences are wrong can be given a more formal justification
Wrong may sound a bit too harsh. Normal people generally only care about the basic operators like +-*/^ and if those don't work like how they write in math, that may be called wrong. Fortunately those are "in order" in C (except power operator which doesn't exist)
However there are some other operators that might not work as many people expect. For example the bitwise operators have lower precedence than comparison operators, which was already mentioned by Eric Postpischil. That's less convenient but still not quite "wrong" because there wasn't any defined standard for them before. They've just been invented in the last century during the advent of computers
Another example is the shift operators << >> which have lower precedence than +-. Shifting is thought as multiplication and division, so people may expect that it should be at a higher level than +-. Writing x << a + b may make many people think that it's x*2a + b until they look at the precedence table. Besides (x << 2) + (x << 4) + (y << 6) is also less convenient than simple additions without parentheses. Golang is one of the languages that fixed this by putting <</>> at a higher precedence than + and -
In other languages there are many real examples of "wrong" precedence
One example is T-SQL where -100/-100*10 = 0
PHP with the wrong associativity of ternary operators
Excel with wrong precedence (lower than unary minus) and associativity (left-to-right instead of right-to-left) of ^:
According to Excel, 4^3^2 = (4^3)^2. Is this really the standard mathematical convention for the order of exponentiation?
Why does =-x^2+x for x=3 in Excel result in 12 instead of -6?
Why is it that Microsoft Excel says that 8^(-1^(-8^7))) = 8 instead of 1/8?
It depends which precedence convention is considered "correct". There's no law of physics (or of the land) requiring precedence to be a certain way; it's evolved through practice over time.
In mathematics, operator precedence is usually taken as "BODMAS" (Brackets, Order, Division, Multiplication, Addition, Subtraction). Brackets come first and Subtraction comes last.Ordering Mathematical Operations | BODMAS Order of operations
Operator precedence in programming requires more rules as there are more operators, but you can distil out how it compares to BODMAS.
The ANSI C precedence scheme is pictured here:
As you can see, Unary Addition and Subtraction are at level 2 - ABOVE Multiplication and Division in level 3. This can be confusing to a mathematician on a superficial reading, as can precedence around suffix/postfix increment and decrement.
To that extent, it is ALWAYS worth considering adding brackets in your mathematical code - even where syntactically unnecessary - to make sure to a HUMAN reader that your intention is clear. You lose nothing by doing it (although you might get flamed a bit by an uptight code reviewer, in which you can flame back about coding risk management). You might lose readability, but intention is always more important when debugging.
And yes, the link you provide is a good example. Countless expensive production errors have resulted from this.
I want my program to read a mathematical expression from standard input and print a bitmap with the expression formatted similarly to how Latex does this. Input is limited to simple expressions, that is, consisting of arithmetic operators, subscripts, superscripts and fraction bars.
For now, the program can interpret an expression and store it as a tree. The only problem I do not know how to solve is how to divide a plane into sections/boxes in order to print the expression on a bitmap properly. The biggest problem is with subscripts, superscripts (downscaling and placing symbols higher or lower), fraction bars and the fact that it has to work well recursively for example abcd
I would be grateful for an answer. I have tried to find a similar problem on the web, but nobody seems to have asked a question like this before.
In compiler design ,If I have a grammar defined as
E-->E+E/E-E/id
T-->id
Now since this grammar is left-recursive and also we can say that both the + and - operators are left-associative so then when the parse tree would be constructed ,so if I have an input like id+id-id ,so then first id+id would be executed and then the result of addition would subtract id .
And if I have an input string like id+id+id ,then in that case execution order would be (id+id)+id .
I am not getting this concept as I have studied that Associativity of operators do not define the order of evaluation ,if that is so true then what about the parse tree generation because if we are asked to compare two parse trees and find which one would work properly if say I have an input string like id+id-id,then we would chose the parse tree wherein we have the order of evaluation such that the subtree which is rooted at node + would be executed first and then the subtree rooted at - would be executed first ,so please clarify me the actual parameters which decide the order of evaluation in the c program.
The associativity defines whether a - b - c is equivalent to (a - b) - c or a - (b - c), that is whether c is added to the result of adding b to a or whether the result if b + c is added to a. Associativity thus also tells you what the AST of the expression looks like.
What associativity does not tell you is which one of a, b and c is evaluated first. That is if you write f() - g() - h(), you know that it's equivalent to (f() - g()) - h() because subtraction is left-associative. However you do not know whether f is executed before g and/or h and so on. That's what people mean when they say that associativity does not define evaluation order.
clarify me the actual parameters which decide the order of evaluation in the c program.
The order of evaluation of operands in an arithmetic expression in a C program is undefined. That is it is completely up to the compiler.
I'm making a program tasked with converting a math expression such as (2+4)*(4/3) into
a binary tree, and then manipulating it.
First, when parsing, i've turned the string into two stacks, operands and operators.
How can I determine what the root should be, given that in my example above the tree should look like this:
*
/ \
+ /
/\ /\
2 4 4 3
Notice that the root is * which is the outermost operand. But on my operand stack it looks like this:
/
*
+
And there could be cases like (2+4+3)*4 or 2*((4+1)/3).
How can I determine which operand should be the root of my binary tree?
Convert your infix expression to either prefix or postfix notation. You can't really have a proper operator stack without doing this.
In postfix notation, the expression (2+4)*(4/3) would look like:
2 4 + 4 3 / *
So, you have the multiplication appearing at the end which could be inserted into the tree as its root. Evaluating a postfix expression is much easier for a computer as grouping is not needed.
You can't just put the operators on your stack in the order that they appear in your expression. Once you've done that, you lose the ability to disambiguate, as you've identified.
See e.g. http://en.wikipedia.org/wiki/Shunting_yard_algorithm for an algorithm to parse infix notation.
You can use a stack to implement an infix to binary expression tree. This link has a C++ implementation:
An infix to binary-expression-tree parser that usings two stacks
one for operators and another for operands, which all derive from a base node class.