At first glance, the shunting yard algorithm seems applicable to POSIX regular expression parsing, but since I don't have much experience (or theoretical background) in writing parsers, I'd like to ask SO before jumping in and writing something only to get stuck halfway.
Perhaps a more sophisticated version of the question is: What is a good formal statement of the class of problems the shunting yard algorithm can be applied to?
Clarification: This question is about whether you can parse POSIX re syntax into an abstract syntax tree using the basic principles of the shunting algorithm, not whether you can use regular expressions to implement the shunting algorithm. Sorry I wasn't clear enough stating that to begin with!
I'm fairly sure it can. If you look at Henry Spencer's regular expression package:
regexp.shar.Z
which was the basis for Perl's regular expressions, you will notice that he describes the program as being in "railroad normal form".
I reckon you'd have some problems because different characters have different meanings in different contexts e.g.
^[^a-z][asd-]
The ^ has two different meanings and so does the -. I think I'd choose a recursive descent parser.
I don't see why it wouldn't be suitable. Looking at some old code, it does seem I used a completely different parsing strategy for my last regexp parser, however (essentially, a walk-through from the start, building the resulting automaton representation as you go, with some look-ahead and recursive calls to implement grouping of regular expressions).
I will say that the answer to your question is "no, you cannot implement the shunting yard algorithm using a regular expression." This is for the same reason you cannot parse arbitrary HTML using regular expressions. Which boils down to this:
Regular expressions do not have a stack. Because the shunting yard algorithm relies on a stack (to push and pop operands as you convert from infix to RPN), then regular expressions do not have the computational "power" to perform this task.
This glosses over many details, but a "regular expression" is one way to define a regular language. When you "use" a regular expression, you are asking the computer to say: "Look at a body of text and tell me whether or not any of those strings are in my language. The language that I defined using a regular expression." I'll point to this most excellent answer which you and everyone reading this should upvote for more on regular languages.
So now you need some mathematical concept to augment "regular languages" in order to create more powerful languages. If you were to characterize the shunting yard algorithm as an realization of a model of computational power, then you might say that the algorithm would be described as a context-free grammar (hey what do you know, that link uses an expression parse tree as an example.) A push-down automata. Something with a stack.
If you are less-than-familiar with automata theory and complexity classes, then those wikipedia articles are probably not that helpful without explaining them from the ground up.
The point being, you may be able to use regex to help writing shunting yard. But regex are not very good at doing operations that have an arbitrary depth, which this problem has. So I would not spend too much time going down the regex avenue for this problem.
Related
For example, I can type into Google or WolframAlpha 6+6, or 2+237, which could be programmed by asking a user for a and b, then evaluating return a+b. However, I might also type 5*5^(e) or any other combination, yet the program is hard-coded to only evaluate a+b expressions.
It's easy to represent the more complex problems in code, on any common language.
return 5*pow(5,Math.E) #pseudocode
But if I can't expect a user's input to be of a given form, then it isn't as simple as
x = Input("enter coefficient")
b = input("enter base")
p = input("enter power")
print(x*pow(b,p))
With this code, I'm locked-in to my program only able to evaluate a problem of the form x*b^p.
How do people write the code to dynamically handle math expressions of any form?
This might not be a question that 'appropriate' for this venue. But I think it's reasonable to ask. At the risk of having my answer voted out of existence along with the question, I'll offer a brief answer.
Legitimate mathematical expressions, from simple to complicated, obey grammatical rules. Although a legal mathematical expression might seem unintelligible, grammatically speaking it will be far less complicated that the grammar needed to understand small bodies of human utterances.
Still, there are levels of 'understanding' built into the products available on the 'net. Google and WolframAlpha are definitely 'high-end'. They attempt to get as close as possible to defining grammars capable of representing human utterance, in effect at least. Nearer the lower end are products such as Sympy which accept much more strictly defined input.
Once the software decides what part of the input is a noun, and what is a verb, so to speak, it proceeds to perform the actions requested.
To understand more you might have to undertake studies of formal language, artificial intelligence, programming and areas I can't imagine.
I'm making a calculator for android using kivy and it's almost done (cannot use java becasue python is the only language I know). The way it works is, the user inputs an expression and eval is used to evaluate that expressions. At the moment, in my app, the eval expression can contain numbers, mathematical operators (+, -, /, *) and most of the operators from math module (In short, it's a scientific calculator) and it works as intended. In future I'm planning on integrating matplotlib to add graphical capabilities to the app. So within the context, is eval a safe option. Given my limited experience in programming I didn't think of eval as being an unsafe method in a lot of situations, it was just a few days ago when I stumbled upon a thread which discussed the safety issues associated with using eval.
So is it better to change the eval to something else within my app or is it safe in the given situation? if yes to the former, what's the best alternative without changing my code too much. Also it'd be better if it is in the python standard library so that I don't increase the app size)
Edit: Btw, the eval expression is calculated in real time (not sure if this matters).
This article ought to be of good use to you -- it's almost precisely what you're trying to do.
This one, on the other hand, is a good warning as to what could happen if you're not careful. Presumably there are good ways around this (maybe just filter out any input containing double underscores, as a really really simple start), but it's worth remembering that Python has lots of magic, and that most of said magic is accessible through eval().
Codd's Algorithm converts an expression in tuple relational calculus to Relational Algebra. I'd like to know:
whether there's a standard implementation of the algorithm?
whether this algorithm used anywhere ? (For, the industry needs only SQL & variants, I'm not sure about database theorists in academia)
What's the complexity of the reduction?
Implementing Codd's algorithm should be easy enough (*), but:
converting an expression in one language to an expression in another language requires one to know what that "other language" is. Can you tell ?
Using the output of such a conversion is only sensible if the output of said conversion can be included in a source code, or in some other way passed onto a compiler that understands that particular language. Do you know of a "standard" algebra-based relational data language ?
Imo, these are the two most obvious reasons why there is hardly a point in an industrial implementation.
(*) if your input is a parse tree that consists of nodes such as <universalquantification>, <existentialquantification>, <restriction>, <cartesian>, ...
My task is to write an app(unfortunatly on C) which reads expression in infix notation(with variables, unary and binary operators) and store it in memory, then evaluate it. Also, checks for correctness should be performed.
for example:
3*(A+B)-(-2-78)*2+(0*A)
After I got all values, program should calculate it.
The question is:
What is the best way to do this?(with optimization and validation)
What notation to choice as the base of tree?
Should I represent expression as tree? If so I can easily optimize it(just drop nodes which returns 0 or smth else).
Cheers,
The link suggested in the comment by Greg Hewgill above contains all the info you'll need:
If you insist on writing your own,
a recursive descent parser is probably the simplest way to do it by hand.
Otherwise you could use a tool like Bison (since you're working in C). This tutorial is the best I've seen for working with Flex and Bison (or Lex/Yacc)
You can also search for "expression evaluator" on Codeproject - they have a lot of articles on the topic.
I came across the M4 program's expression evaluator some time ago. You can study its code to see how it works. I think this link on Google Codesearch is the version I saw.
Your question hints at requirements being put on your solution:
unfortunatly on C
so some suggestions here might not be permissible. Nevertheless, I would suggest that this is quite a complicated problem to solve, and that you would be much better off trying to find a suitable existing library which you could link into your C code to do this for you. This would likely reduce the time and effort required to get the code working, and reduce the ongoing maintenance effort. Of course, you'd have to think about licensing, but I'd be surprised if there wasn't a good parsing/evaluation library "out there" which could do a good job of this.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What is the use of finite automata? And all the concepts that we study in the theory of computation. I've never seen their uses yet.
They are the theoretical underpinnings of concepts widely used in computer science and programming, and understanding them helps you better understand how to use them (and what their limits are). The three basic ones you should encounter are, in increasing order of power:
Finite automata, which are equivalent to regular expressions. Regular expressions are widely used in programming for matching strings and extracting text. They are a simple method of describing a set of valid strings using basic characters, grouping, and repitition. They can do a lot, but they can't match balanced sets of parentheses.
Push-down automata, equivalent to context-free grammars. Text/input parsers and compilers use these when regular expressions aren't powerful enough (and one of the things you learn in studying finite automata is what regular expressions can't do, which is crucial to knowing when to write a regular expression and when to use something more complicated). Context-free grammars can describe "languages" (sets of valid strings) where the validity at a certain point in parsing the string does not depend on what else has been seen.
Turing machines, equivalent to general computation (anything you can do with a computer). Some of the things you learn when you cover these enable you to understand the limits of computing itself. A good theory course will teach you about the Halting Problem, which enables you to identify problems for which it is impossible to write a program. Once you've identified such a problem, then you know to stop trying (or refine it to something that is possible).
Understanding the theory and limitations of these various computing mechanisms enable you to better understand problems and programs and think more deeply about programming.
There was a request-for-work published about a year ago on one of the freelance coding exchange sites asking, essentially, for a program which solved the Halting Problem. Several people responded with offers, saying they "understood the requirements" and could "start immediately". It was impossible to write a program which met the requirements. Understanding computing theory enables you to not be that bidder who demonstrates, in public, that he really doesn't understand computing (and doesn't bother to thoroughly investigate a problem before declaring understanding and making an offer).
Finite automata are very useful for communication protocols and for matching strings against regular expressions.
Automatons are used in hardware and software applications. Please read the implementation section here http://en.wikipedia.org/wiki/Finite-state_machine#Implementation
There is also a notion of Automata-based programming. Please check this http://en.wikipedia.org/wiki/Automata-based_programming
cheers
Every GUI, every workflow can be treated as a finite automata. Think of each page as a state and transitions occurring due to certain events. Perhaps you can't proceed to a certain page or the next stage of the workflow until a series of conditions are met.
Finite automata are e.g. used to parse formal languages. This means that finite automata are very usefull in the creation of compiler and interpreter techniques.
Historicaly, the finite state machine showed that a lot of problems can be solved by a very simple automate.
Try taking a compilers course. You will very likely make a compiler or interpreter using a finite state automaton to implement a recursive descent parser.
For example to manage states of some objects with defined life cycle.
As example of this: orders in book shop.
An order can have the following states:
-ordered
-payed
-shipping
-done
and program of the finite automata knows how one state can be changed by other.
The finite automata is a type of state machine (SM). In general SMs are used for parsing formal languages.
You can use as a formal language many entities, not only characters.
And regular language is a type of formal language.
There are some theory that show, what type of the SM is better to parse a regular language:
http://en.wikipedia.org/wiki/Regular_language