I just completed a beginners C programming course and I thought handling arithmetic would be a great first project! I had no problem figuring out how to do basic single operation arithmetic.
I ahieved what I initially desired but now I'm curious if it's possible with(out) some library, to handle multiple operators in precedence, e.g. 5+2*3. Of course 2*3 should be evaluated and then 5 added to the product. If parenthesis wrap an equation it should be handled first
So to clarify I want to read stdin and parse a literal string, then handle any arithmetic by PEMDAS precedence. I have tried many different queries to find something like what I'm looking for - I'm sure someone has done it, I just can't find it!
After some time, I managed to write my own infix to postfix equation. The evaluation part.. Not so much!
I did end up with a functional solution from #Jerry Jeremiah's answer. Using the TinyExpr library, I was able to include their header, compile with their .c program and upon calling te_interp, handle arithmetic better than I could hope to!
Related
From main(), I want the user to input a mathematical function (I,e: 2xy) through the command line. From there, I initially thought to iterate through the string and parse out different arithmetic operators, x, y, etc. However, this could become fairly complicated for more intricate functions, (e.g: (2x^2)/5 +sqrt(x^4) ). Is there a more general method to be able to parse a mathematical function string like this one?
One of the most helpful ways to deal with parsing issues like that is to switch the input methods from equations like that to an RPN based input where the arguments come first and the operators come last.
Rewriting your complex equation would end up looking like:
2 2 x ^ * 5 / x 4 ^ sqrt +
This is generally easier to implement, as you can do it with a simple stack -- pushing new arguments on, while the operators pull the require pieces off the stack and put the result back on. Greatly simplifies the parsing, but you still need to implement the functions.
What you need is an expression evaluator.
A while ago, I wrote a complete C expression evaluator (i.e. evaluated expressions written using C syntax) for a command line processor and scripting language on an embedded system. I used this description of the algorithm as a starting point. You could use the accompanying code directly, but I did not like the implementation, and wrote my own from the algorithm description.
It needed some work to support all C operators, function calls, and variables, but is a clear explanation and therefore a good starting point, especially if you don't need that level of completeness.
The basic principle is that expression evaluation is easier for a computer using a stack and 'Reverse Polish Notation', so the algorithm converts an in-fix notation expression with associated order of precedence and parentheses to RPN, and then evaluates it by popping operands, performing operations, and pushing results, until there are no operations left and one value left on the stack.
It might get a bit more complicated is you choose to deal with implicit multiply operators (2xy rather then 2 * x * y for example. Not least because you'd need to unambiguously distinguish the variables x and y from a single variable xy. That is probably only feasible if you only allow single character variable names. I suggest you either do that and insert explicit multiply operators on the operator stack as part of the parse, or you disallow implicit multiply.
First, I know I know. This question has kind of been asked some times before, but most of the answers got on other topics only partly answer my question.
I'm doing something which can parse C like expressions.
That includes expressions for example like (some examples)
1) struct1.struct2.structarray[283].shd->_var
2) *((*array_dptr)[2][1] + 5)
3) struct1.struct2.struct3.var + b * c / 3 % 5
Problem is... I need to be fast on this. The fastest possible, even if it makes the code ugly - well, obviously, the speed improvement must be tangible. The reason is that it is interpreted. It needs to be fast...
I have many questions, and I will probably ask some more depending on your answers. But anyways...
First, I'm aware of "operator priorities". For example algorithms implemented in C compilers will assign to operators a priority number and evaluate the expression based on that.
I've consulted this table : http://en.wikipedia.org/wiki/Operators_in_C_and_C++#Operator_precedence
Now, this is cool but... I wonder a few things.
My principal question is... how would you implement this to be the fastest possible?
I have thought about for example... (please note the program I'm speaking about actually parses a file containing these expressions, and not all C operators will be supported)
1) Stocking the expression string into an array, storing each operator position inside an array, and then starting to parse all this crap, starting from the highest priority operator. For example if I had str = "2*1+3", then after checking all the operators present, I would check for the position at str[1], and the check at right and left, do the operation (here multiply) and then substitude the expression with the result and evaluate again.
The problem I see there is... say two operators in the expr are the same priority
for example : var1 * var2 / var3 / var4
since * and / have both the same precedence, how to know on which position to start the parsing? Of course this example is rather intuitive, but I can the problem growing on enormous expressions.
2) Is this even possible to do it non recursive? Usually recursive means slower due to multiple function call setting their own stack frames, re-initializing stuff etc etc.
3) How to distinguish unary operators from non unaries?
For example : 2 + *a + b * c
There is the dereferencing op and the multiplication one. Intuitively I have an idea on how to do it, but I ain't sure. I'd rather have your advices on this (i think : check if one of the right or left members are operators, if so, then it's unary?)
4) I don't get expressions being evaluated right-to-left. Seems so unnatural to be. More that I don't unterstand what does it means. Would you show an example? Why do it that way?!?
5) Do you have better algorithms in head? Better ideas of achieving it?
For now, that sums pretty much what I'm thinking about.
This ain't an homework by the way. It's practical stuff.
Thanks!
In K&R Section 5.10, in their sample implementation of a grep-like function, there are these lines:
while (--argc > 0 && (*++argv)[0] == '-')
while (c = *++argv[0])
Understanding the syntax there was one of the most challenging things for me, and even now a couple weeks after viewing it for the first time, I still have to think very slowly through the syntax to make sense of it. I compiled the program with this alternate syntax, but I'm not sure that the second line is allowable. I've just never seen *'s and ++'s interleaved like this, but it makes sense to me, it compiles, and it runs. It also requires no parentheses or brackets, which is maybe part of why it seems more clear to me. I just read the operators in one direction only (right to left) rather than bouncing back and forth to either side of the variable name.
while (--argc > 0 && **++argv == '-')
while (c = *++*argv)
Well for one, that's one way to make anyone reading your code to go huh?!?!?!
So, from a readability standpoint, no, you probably shouldn't write code like that.
Nevertheless, it's valid code and breaks down as this:
*(++(*p))
First, p is dereferenced. Then it is incremented. Then it is dereferenced again.
To make thing worse, this line:
while (c = *++*argv)
has an assignment in the loop-condition. So now you have two side-effects to make your reader's head spin. YAY!!!
Seems valid to me. Of course, you should not read it left to right, that's not how C compiler parses the source, and that's not how C language grammatics work. As a rule of thumb, you should first locate the object that's subject to operating upon (in this case - argv), and then analyze the operators, often, like in this case, from inside (the object) to outside. The actual parsing (and reading) rules are of course more complicated.
P. S. And personally, I think this line of code is really not hard to understand (and I'm not a C programming guru), so I don't think you should surround it with parentheses as Mysticial suggests. That would only make the code look big, if you know what I mean...
There's no ambiguity, even without knowledge of the precedence rules.
Both ++ and * are prefix unary operators; they can only apply to an operand that follows them. The second * can only apply to argv, the ++ to *argv, and the first * to ++*argv. So it's equivalent to *(++(*argv)). There's no possible relationship between the precedences of ++ and * that could make it mean anything else.
This is unlike something like *argv++, which could conceivably be either (*argv)++ or *(argv++), and you have to apply precedence rules to determine which (it's *(argv++)` because postfix operators bind more tightly than prefix unary operators).
There's a constraint that ++ can only be applied to an lvalue; since *argv is an lvalue, that's not a problem.
Is this code valid? Yes, but that's not what you asked.
Is this code acceptable? That depends (acceptable to who?).
I wouldn't consider it acceptable - I'd consider it "harder to read than necessary" for a few different reasons.
First; lots of programmers have to work with several different languages, potentially with different operator precedence rules. If your code looks like it relies on a specific language's operator precedence rules (even if it doesn't) then people have to stop and try to remember which rules apply to which language.
Second; different programmers have different skill levels. If you're ever working in a large team of developers you'll find that the best programmers write code that everyone can understand, and the worst programmers write code that contains subtle bugs that half of the team can't spot. Most C programmers should understand "*++*argv", but a good programmer knows that a small number of "not-so-good" programmers either won't understand it or will take a while to figure it out.
Third; out of all the different ways of writing something, you should choose the variation that expresses your intent the best. For this code you're working with an array, and therefore it should look like you intend to be working with an array (and not a pointer). Note: For the same reason, "uint32_t foo = 0x00000002;" is better than "uint32_t foo = 0x02;".
At first glance, the shunting yard algorithm seems applicable to POSIX regular expression parsing, but since I don't have much experience (or theoretical background) in writing parsers, I'd like to ask SO before jumping in and writing something only to get stuck halfway.
Perhaps a more sophisticated version of the question is: What is a good formal statement of the class of problems the shunting yard algorithm can be applied to?
Clarification: This question is about whether you can parse POSIX re syntax into an abstract syntax tree using the basic principles of the shunting algorithm, not whether you can use regular expressions to implement the shunting algorithm. Sorry I wasn't clear enough stating that to begin with!
I'm fairly sure it can. If you look at Henry Spencer's regular expression package:
regexp.shar.Z
which was the basis for Perl's regular expressions, you will notice that he describes the program as being in "railroad normal form".
I reckon you'd have some problems because different characters have different meanings in different contexts e.g.
^[^a-z][asd-]
The ^ has two different meanings and so does the -. I think I'd choose a recursive descent parser.
I don't see why it wouldn't be suitable. Looking at some old code, it does seem I used a completely different parsing strategy for my last regexp parser, however (essentially, a walk-through from the start, building the resulting automaton representation as you go, with some look-ahead and recursive calls to implement grouping of regular expressions).
I will say that the answer to your question is "no, you cannot implement the shunting yard algorithm using a regular expression." This is for the same reason you cannot parse arbitrary HTML using regular expressions. Which boils down to this:
Regular expressions do not have a stack. Because the shunting yard algorithm relies on a stack (to push and pop operands as you convert from infix to RPN), then regular expressions do not have the computational "power" to perform this task.
This glosses over many details, but a "regular expression" is one way to define a regular language. When you "use" a regular expression, you are asking the computer to say: "Look at a body of text and tell me whether or not any of those strings are in my language. The language that I defined using a regular expression." I'll point to this most excellent answer which you and everyone reading this should upvote for more on regular languages.
So now you need some mathematical concept to augment "regular languages" in order to create more powerful languages. If you were to characterize the shunting yard algorithm as an realization of a model of computational power, then you might say that the algorithm would be described as a context-free grammar (hey what do you know, that link uses an expression parse tree as an example.) A push-down automata. Something with a stack.
If you are less-than-familiar with automata theory and complexity classes, then those wikipedia articles are probably not that helpful without explaining them from the ground up.
The point being, you may be able to use regex to help writing shunting yard. But regex are not very good at doing operations that have an arbitrary depth, which this problem has. So I would not spend too much time going down the regex avenue for this problem.
My task is to write an app(unfortunatly on C) which reads expression in infix notation(with variables, unary and binary operators) and store it in memory, then evaluate it. Also, checks for correctness should be performed.
for example:
3*(A+B)-(-2-78)*2+(0*A)
After I got all values, program should calculate it.
The question is:
What is the best way to do this?(with optimization and validation)
What notation to choice as the base of tree?
Should I represent expression as tree? If so I can easily optimize it(just drop nodes which returns 0 or smth else).
Cheers,
The link suggested in the comment by Greg Hewgill above contains all the info you'll need:
If you insist on writing your own,
a recursive descent parser is probably the simplest way to do it by hand.
Otherwise you could use a tool like Bison (since you're working in C). This tutorial is the best I've seen for working with Flex and Bison (or Lex/Yacc)
You can also search for "expression evaluator" on Codeproject - they have a lot of articles on the topic.
I came across the M4 program's expression evaluator some time ago. You can study its code to see how it works. I think this link on Google Codesearch is the version I saw.
Your question hints at requirements being put on your solution:
unfortunatly on C
so some suggestions here might not be permissible. Nevertheless, I would suggest that this is quite a complicated problem to solve, and that you would be much better off trying to find a suitable existing library which you could link into your C code to do this for you. This would likely reduce the time and effort required to get the code working, and reduce the ongoing maintenance effort. Of course, you'd have to think about licensing, but I'd be surprised if there wasn't a good parsing/evaluation library "out there" which could do a good job of this.