How to represent a mathematical expression in memory - symbolic-math

Me and my friends are trying to implement a computer algebra system.
We have already implemented an algorithm that converts an expression like 1+2*4 to a binary tree
+
/ \
1 *
/ \
2 4
And we implemented an algorithm that evaluates this kind of binary tree.
We now want to implement an algorithm that simplifies expressions with variables.
For example, x+2x will become 3x
I was thinking it would be easier to merge similar operators in the binary tree. For example:
+
/ \
a +
/ \
b c
will become
+
/ | \
a b c
This way it would be easier to find terms that can be simplified.
For example, if I had x+2x+3x in my expression then they would be under the same + operator and thus they can find each other much easier without traversing most of the tree.
My friend thinks that we should implement the tree in a different approach. He suggested that we should implement each node as a polynomial, and then we can make operations such as polynomial addition between nodes. Using this approach we can make operations between polynomials much easier, without going back and forth the tree.
Which approach should we choose? is there a better approach other than these two?

Related

What's the best way to read a mathematical function f(x,y) command line argument?

From main(), I want the user to input a mathematical function (I,e: 2xy) through the command line. From there, I initially thought to iterate through the string and parse out different arithmetic operators, x, y, etc. However, this could become fairly complicated for more intricate functions, (e.g: (2x^2)/5 +sqrt(x^4) ). Is there a more general method to be able to parse a mathematical function string like this one?
One of the most helpful ways to deal with parsing issues like that is to switch the input methods from equations like that to an RPN based input where the arguments come first and the operators come last.
Rewriting your complex equation would end up looking like:
2 2 x ^ * 5 / x 4 ^ sqrt +
This is generally easier to implement, as you can do it with a simple stack -- pushing new arguments on, while the operators pull the require pieces off the stack and put the result back on. Greatly simplifies the parsing, but you still need to implement the functions.
What you need is an expression evaluator.
A while ago, I wrote a complete C expression evaluator (i.e. evaluated expressions written using C syntax) for a command line processor and scripting language on an embedded system. I used this description of the algorithm as a starting point. You could use the accompanying code directly, but I did not like the implementation, and wrote my own from the algorithm description.
It needed some work to support all C operators, function calls, and variables, but is a clear explanation and therefore a good starting point, especially if you don't need that level of completeness.
The basic principle is that expression evaluation is easier for a computer using a stack and 'Reverse Polish Notation', so the algorithm converts an in-fix notation expression with associated order of precedence and parentheses to RPN, and then evaluates it by popping operands, performing operations, and pushing results, until there are no operations left and one value left on the stack.
It might get a bit more complicated is you choose to deal with implicit multiply operators (2xy rather then 2 * x * y for example. Not least because you'd need to unambiguously distinguish the variables x and y from a single variable xy. That is probably only feasible if you only allow single character variable names. I suggest you either do that and insert explicit multiply operators on the operator stack as part of the parse, or you disallow implicit multiply.

Algebraic Simplification on Abstract Syntax Tree

I've designed a parser in C that is able to generate AST, but when I begin to implement simplifications it really got messed up. I've successfully implemented rules for the summation below;
x + 0 -> x
x + x -> 2 * x
etc.
But it took huge amount of effort and code to do it. What I did was to search entire tree and try to find a pattern that I can use (lots of recursion) then if there was a cascade of PLUS nodes, I've added them to a list, then worked on that list (summing numbers and combining variables etc.) then I created another tree from that list, and merged it to existing one. It was this paper I used to implement it. In short given the expression 2*x+1+1+x+0 I got 3*x+2. And it was just summation that got me into so much trouble, I can even imagine the advanced stuff. So I realized I was ding something wrong.
I've read this thread but I'm really confused about term rewriting systems (what it really is, how to implement in C).
Is there a more general and effective way to do simplification on AST? Or how to write a term rewriting system in C
Term rewriting is (in simple words) like the 2 examples you provided. (How to convert x + 0 to x in a AST?). It is about pattern matching on AST's, and once there is a match, a conversion of an equivalent expression. It is also called a term rewriting rule.
Note that having a term rewriting rule is not the absolute or general solution of algebraic simplification. The general solution involves having many rewriting rules (you showed two of them), and apply them in a given AST repeatedly until no one success.
Then, the general solution involves the process or coordination on the application of the rewriting rules. i.e. in order to avoid the re-application of a rule that has previously failed, as an example.
There is not a unique way to do it. There are several systems. For proprietary systems it is not known because they keeps it in secrecy, but there are open source systems too, for example Mathomatica is written in C.
I recommend you to check the open system Fōrmulæ. In this, the process of coordination of rewriting rules (which is called "the reduction engine") is relatively simple. It is written in Java. The advantage of this system is that rewriting rules are not hardwired/hardcoded in the system or the reduction engine (they are hot pluggable). Coding a rewriting rule involves the process of pattern matching and conversion, but no when or how it will be called (it follows the Hollywood principle).
In the specific case of Fōrmulæ:
The reduction engine is based (in general terms) on the post-order tree traversal algorithm. so when a node is "visited", its sub-nodes were already visited and (possibly) transformed, but it is possible to alter such that flow (i.e. to solve the unwanted referentiation of a variable in an assignment x <- 5). Note that it is not just a tree traversal, the AST is being actually changed in the process.
In order to efficiently manage the (possibly hundred or thousand) of rewriting rules, every rule has a type of expression where it is applicable, and when a single node is "visited", only the associated rules are checked for a match. For example, your 2 rules can only be applied to "addition" nodes of an AST.
Rewriting rules are not limited to algebraic simplification, they can be used in many other fields such as programming (Fōrmulæ is also its programming language, see examples of Fōrmulæ programs, or in automatic or assisted theorem proving.

Infix expression to Binary Tree in C

I have to parse an infix expression into a binary tree.
The expression is:
(((x1 + 5.12) ∗ (x2 − 7.68))/x3)
I don't really have a clue on how to interpret the expression. Does someone has a clue on how to process this?
Your task is not so hard, firstly you should acquaint yourself with notation types and then with expression parsing.
In general, to parse and evaluate an (infix) expression, you need to:
read and tokenize it, i.e. classify each symbol as: operand, operation, etc.
convert from infix to binary expression tree: this is usually done with algorithms such as Shunting yard algorithm.
create a grammar that defines operation precedence and allows strict1 order of expression evaluation.
Expressions written in infix notation are slightly more difficult to parse, that is why usually they are converted to more "machine friendly" versions, like (reverse) Polish notation which provides some advantages among which is the elimination of the need of parenthesses.
So, as you can see this is roughly the big picture and your task is a part of it. Here is a visualisation of binary expression tree for: 2 * 3 / ( 2 – 1 ) + 5 * ( 4 – 1 )
Here is more on the topic and an example implementation in C++.
1. In your case obeying the rules of Algebra.

Evaluating expressions with operators

First, I know I know. This question has kind of been asked some times before, but most of the answers got on other topics only partly answer my question.
I'm doing something which can parse C like expressions.
That includes expressions for example like (some examples)
1) struct1.struct2.structarray[283].shd->_var
2) *((*array_dptr)[2][1] + 5)
3) struct1.struct2.struct3.var + b * c / 3 % 5
Problem is... I need to be fast on this. The fastest possible, even if it makes the code ugly - well, obviously, the speed improvement must be tangible. The reason is that it is interpreted. It needs to be fast...
I have many questions, and I will probably ask some more depending on your answers. But anyways...
First, I'm aware of "operator priorities". For example algorithms implemented in C compilers will assign to operators a priority number and evaluate the expression based on that.
I've consulted this table : http://en.wikipedia.org/wiki/Operators_in_C_and_C++#Operator_precedence
Now, this is cool but... I wonder a few things.
My principal question is... how would you implement this to be the fastest possible?
I have thought about for example... (please note the program I'm speaking about actually parses a file containing these expressions, and not all C operators will be supported)
1) Stocking the expression string into an array, storing each operator position inside an array, and then starting to parse all this crap, starting from the highest priority operator. For example if I had str = "2*1+3", then after checking all the operators present, I would check for the position at str[1], and the check at right and left, do the operation (here multiply) and then substitude the expression with the result and evaluate again.
The problem I see there is... say two operators in the expr are the same priority
for example : var1 * var2 / var3 / var4
since * and / have both the same precedence, how to know on which position to start the parsing? Of course this example is rather intuitive, but I can the problem growing on enormous expressions.
2) Is this even possible to do it non recursive? Usually recursive means slower due to multiple function call setting their own stack frames, re-initializing stuff etc etc.
3) How to distinguish unary operators from non unaries?
For example : 2 + *a + b * c
There is the dereferencing op and the multiplication one. Intuitively I have an idea on how to do it, but I ain't sure. I'd rather have your advices on this (i think : check if one of the right or left members are operators, if so, then it's unary?)
4) I don't get expressions being evaluated right-to-left. Seems so unnatural to be. More that I don't unterstand what does it means. Would you show an example? Why do it that way?!?
5) Do you have better algorithms in head? Better ideas of achieving it?
For now, that sums pretty much what I'm thinking about.
This ain't an homework by the way. It's practical stuff.
Thanks!

Parse stack into a binary tree?

I'm making a program tasked with converting a math expression such as (2+4)*(4/3) into
a binary tree, and then manipulating it.
First, when parsing, i've turned the string into two stacks, operands and operators.
How can I determine what the root should be, given that in my example above the tree should look like this:
*
/ \
+ /
/\ /\
2 4 4 3
Notice that the root is * which is the outermost operand. But on my operand stack it looks like this:
/
*
+
And there could be cases like (2+4+3)*4 or 2*((4+1)/3).
How can I determine which operand should be the root of my binary tree?
Convert your infix expression to either prefix or postfix notation. You can't really have a proper operator stack without doing this.
In postfix notation, the expression (2+4)*(4/3) would look like:
2 4 + 4 3 / *
So, you have the multiplication appearing at the end which could be inserted into the tree as its root. Evaluating a postfix expression is much easier for a computer as grouping is not needed.
You can't just put the operators on your stack in the order that they appear in your expression. Once you've done that, you lose the ability to disambiguate, as you've identified.
See e.g. http://en.wikipedia.org/wiki/Shunting_yard_algorithm for an algorithm to parse infix notation.
You can use a stack to implement an infix to binary expression tree. This link has a C++ implementation:
An infix to binary-expression-tree parser that usings two stacks
one for operators and another for operands, which all derive from a base node class.

Resources