Pros and Cons on designing a calculator with eval - eval

I'm making a calculator for android using kivy and it's almost done (cannot use java becasue python is the only language I know). The way it works is, the user inputs an expression and eval is used to evaluate that expressions. At the moment, in my app, the eval expression can contain numbers, mathematical operators (+, -, /, *) and most of the operators from math module (In short, it's a scientific calculator) and it works as intended. In future I'm planning on integrating matplotlib to add graphical capabilities to the app. So within the context, is eval a safe option. Given my limited experience in programming I didn't think of eval as being an unsafe method in a lot of situations, it was just a few days ago when I stumbled upon a thread which discussed the safety issues associated with using eval.
So is it better to change the eval to something else within my app or is it safe in the given situation? if yes to the former, what's the best alternative without changing my code too much. Also it'd be better if it is in the python standard library so that I don't increase the app size)
Edit: Btw, the eval expression is calculated in real time (not sure if this matters).

This article ought to be of good use to you -- it's almost precisely what you're trying to do.
This one, on the other hand, is a good warning as to what could happen if you're not careful. Presumably there are good ways around this (maybe just filter out any input containing double underscores, as a really really simple start), but it's worth remembering that Python has lots of magic, and that most of said magic is accessible through eval().

Related

syntax check using a subset of C

I want to build a web based service that lets the user input some C code that the server will then compile and run and return results. I know, I know, security nightmare. So maybe I could go with chroot or lxc or something like that. There are good posts on stackoverflow about those. Another option is to use programming contest software.
What I am doing isn't for general programming purposes though. Users will be able to add code to a few stub functions and that is it. They don't need to be able to use pointers or arrays or strings. They shouldn't be able to open/close/read/write files or sockets or shared memory. They can't even create their own functions. They should only be able to do the following:
// style comments
/* */ style comments
declare variables of type int, double, float, int64_t, int32_t, uint64_t, uint32_t
for, while, do
+, -, *, /, % arithmetic operators ( * as dereference is NOT allowed )
( )
+, - unary operators
++, -- operators
math functions like sin, cos, abs, fabs, etc
a bunch of API functions that will exist
switch, case, break
{ }
if, else, ==, !=
=, +=, -=, *=, /=, etc
Is there a tool I can use to check a given chunk of C code to make sure it contains only those elements?
If I can't find an existing solution I can use Antlr or something similar to come up with it myself.
For a real-world example of a web service that runs user code, check out the Travis CI continuous integration service. Open-source projects use it to run their unit tests in a centralized manner. The Travis process goes a bit like this:
Fire up a brand-new VM from a known-good configuration.
Load and compile the user code.
Run the tests and display results.
Discard the VM.
There is a time limit (10 minutes IIRC) to prevent people from running botnets on the system, but other than that, the VM's are fully functional and connected to the Internet. No need for restricted syntax or other artificial limitations.
The idea to keep in mind is that you'll never be able to keep a server secure from the horrors of user code, no matter how much you restrict the user. The alternative is just assuming the server is completely ruined the moment it's touched by user code and then just trash it, which is what Travis does. VM software usually has snapshot functionality to help this kind of thing.

Convert Bash Script to C. Is that possible?

I found the following Bash -> C converter.
Is such a way possible to convert from bash to c?
Reason: Is C faster then BASH? I want to run something as a deamon instead of a cron job.
It is possible, the question is what are the objectives of doing so. They could be a subset of:
Speed interpreted scripts can be slower
Maintainability perhaps you have a time that has more experience with C
Flexibility the script is showing limitations on what can be achieved with reasonable effort
Integration perhaps you already have a code base that you're willing to tightly integrate with the scripts
Portability
There are also other reasons, like scalability, efficiency, and probably a lot more.
Based on the objectives of the "conversion", there are quite a few ways to achieve a C equivalent, varying the amount of code that will be "native". As an example we can consider two extremes.
On one extreme, we have a compiled C code that executes mostly as bash would, so every line of the original script would produce code equivalent to a fork/exec/wait system calls, where the changes would mostly be performing equivalents to wildcard expansion, retrieving of values from environment variables, handling synchronization of the forked processes, and also handling piping with the appropriate system call.
Notice that this "simple" conversion is already tons of work, that would probably be worse than just writting another shell interpreter. Also, it doesn't meet many of the objectives above, since portability wise, it is still probably dependent on the operating system's syscalls, and performance wise, the only gain is from initially parsing the command line.
On the other extreme, we have a complete rewrite in a more C fashion. This will replace all conditionals with C conditionals, ls, cd and rm commands into their respective system calls and possibly replacing string processing with appropriate libraries.
This might be better in achieving some of the objectives, but the cost would probably be even greater than the other way, also removing a lot of code reuse, since you'd have to implement function equivalents to simple commands.
As for a tool for automating this, I don' know of any, and if there are any they probably don't have widespread use because converting Bash to C or C to Bash isn't probably a good idea. If such need arises, it is probably a sympton of a design problem, and therefore a redesign is probably a better solution. Programming languages and Scripting Languages are different tools for different jobs, even though there are areas of intersection between what can be done with them. In general,
Don't script in C, and don't code in Bash
It is best to know how and when to use the tools you have, then to find a generic universal tool (aka. there are no such things as silver bullets).
I hope this helps a little =)
I'm sure someone has made a tool, just because they could, but I haven't seen one. If you need to run a bash script from C code, it's possible to just directly execute it via (for example) a system call:
system("if [ -f /var/log/mail ]; then echo \"you've got mail! (file)\"; fi");
Other than that, I'm not aware of an easy way to "automatically" do it. As humans we can look at the above and equate that to:
if( access( "/var/log/mail", F_OK ) != -1 )
printf("you've got mail! (file)");
As one of a dozen ways that could be achieved. So it's pretty easy to do that by hand, obviously it's going to take a lot more effort to make, what can be thought of as a bash->C compiler to do it automatically.
So is it possible? Sure!
Example? Sorry, no.
There's a program I use to obfuscate code when I need that.
For some of the programs I've used it on, it does improve the speed, on others it slows the script down, but that's not why i use it. The main utility for me is that the binary is not capable of being changed or read by casual users.
article: here http://www.linux-magazine.com/Online/Features/SHC-Shell-Compiler
developer's site here: http://www.datsi.fi.upm.es/~frosal/sources/
As mentioned on one of those pages, it falls someplace between a gadget and an actual tool.

Manually translating code from one language to another

I often write codes in MATLAB/Python to test whether my algorithm is feasible (& actually works). I then need to convert the entire code into C and sometimes, in FORTRAN90.
What would be a good way to manually convert a medium sized code from one language to another?
I have tried :
Converting the entire code from one into another and then testing it.
(Sometimes, there are errors and bugs which just won't go away and the finding the source of the error becomes a problem)
Go line by line and check for consistency of outputs every few lines.
(Too time consuming)
Use converters like f2c.
(In my experience, they are extremely horrible. I link to a lot of libraries which have different function calls for C and Fortran)
Also,:
I am fairly conversant with the programming languages I deal with so I don't need manuals or reference guides for my work (i.e. I know the syntax).
I am not asking this question specifically about MATLAB and C but rather as a translation paradigm.
Regarding the size, the codes are less than 100 lines long.
I dont want to call the code of one language to another. Please don't suggest that.
Different languages call for different paradigms. You definitely don't write and design code the same way in eg. Matlab, Python, C# or C++. Even object hierarchies will change a lot depending on the language.
That said, if your code consists in a few interconnected procedures, then you may go away with a direct line by line translation (every language allow you to write two or three interconnected functions while remaining idiomatic). But this is the case only for the simplest programs.
Prototyping in a high level language and then implementing the same idea in a robust and clean way in a "production" language is a very good practice, but involves two very different things :
Prototype in whatever language you want. Test, experiment, and convince yourself that the idea works. Pay attention to the big picture, don't focus on performance but on the high level ideas. Pay also attention to difficulties that you encounter when implementing, as you'll face them again in step 2.
Implement from scratch the idea in the production environment in language X. It will be quicker than if you did not do the prototyping stage, since most of the difficulties have been met in stage 1. Use idiomatic X, and focus on correctness. Pay attention to corner cases, general robustness, and once it works correctly, performance. You'll notice that roughly half of your code is made of new things which did not appear in 1. (eg. error checking, corner case handling, input/output, unit testing, etc).
You can see that line by line translation is obviously not a good idea, since you don't translate into the same program.
Also, when not prototyping, I find myself throwing away the first version and making another one that I like better, ie. I find myself prototyping ! Implementing the same thing twice is not a loss of time, it is normal development flow.
You may want to consider using a higher level domain specific language with multiple backends (e.g., Matlab, C, Fortran), producing clean and idiomatic code for each target language, probably with some optimisations. If your problem domain is narrow and every piece of code is more or less typical, it should be fairly trivial to design and implement such a DSL.
Break the source down into psuedo-code with input/process/output and then write your new code base to fit that spec.

Can the shunting yard algorithm parse POSIX regular expressions?

At first glance, the shunting yard algorithm seems applicable to POSIX regular expression parsing, but since I don't have much experience (or theoretical background) in writing parsers, I'd like to ask SO before jumping in and writing something only to get stuck halfway.
Perhaps a more sophisticated version of the question is: What is a good formal statement of the class of problems the shunting yard algorithm can be applied to?
Clarification: This question is about whether you can parse POSIX re syntax into an abstract syntax tree using the basic principles of the shunting algorithm, not whether you can use regular expressions to implement the shunting algorithm. Sorry I wasn't clear enough stating that to begin with!
I'm fairly sure it can. If you look at Henry Spencer's regular expression package:
regexp.shar.Z
which was the basis for Perl's regular expressions, you will notice that he describes the program as being in "railroad normal form".
I reckon you'd have some problems because different characters have different meanings in different contexts e.g.
^[^a-z][asd-]
The ^ has two different meanings and so does the -. I think I'd choose a recursive descent parser.
I don't see why it wouldn't be suitable. Looking at some old code, it does seem I used a completely different parsing strategy for my last regexp parser, however (essentially, a walk-through from the start, building the resulting automaton representation as you go, with some look-ahead and recursive calls to implement grouping of regular expressions).
I will say that the answer to your question is "no, you cannot implement the shunting yard algorithm using a regular expression." This is for the same reason you cannot parse arbitrary HTML using regular expressions. Which boils down to this:
Regular expressions do not have a stack. Because the shunting yard algorithm relies on a stack (to push and pop operands as you convert from infix to RPN), then regular expressions do not have the computational "power" to perform this task.
This glosses over many details, but a "regular expression" is one way to define a regular language. When you "use" a regular expression, you are asking the computer to say: "Look at a body of text and tell me whether or not any of those strings are in my language. The language that I defined using a regular expression." I'll point to this most excellent answer which you and everyone reading this should upvote for more on regular languages.
So now you need some mathematical concept to augment "regular languages" in order to create more powerful languages. If you were to characterize the shunting yard algorithm as an realization of a model of computational power, then you might say that the algorithm would be described as a context-free grammar (hey what do you know, that link uses an expression parse tree as an example.) A push-down automata. Something with a stack.
If you are less-than-familiar with automata theory and complexity classes, then those wikipedia articles are probably not that helpful without explaining them from the ground up.
The point being, you may be able to use regex to help writing shunting yard. But regex are not very good at doing operations that have an arbitrary depth, which this problem has. So I would not spend too much time going down the regex avenue for this problem.

Writing expressions: Infix, Postfix and Prefix

My task is to write an app(unfortunatly on C) which reads expression in infix notation(with variables, unary and binary operators) and store it in memory, then evaluate it. Also, checks for correctness should be performed.
for example:
3*(A+B)-(-2-78)*2+(0*A)
After I got all values, program should calculate it.
The question is:
What is the best way to do this?(with optimization and validation)
What notation to choice as the base of tree?
Should I represent expression as tree? If so I can easily optimize it(just drop nodes which returns 0 or smth else).
Cheers,
The link suggested in the comment by Greg Hewgill above contains all the info you'll need:
If you insist on writing your own,
a recursive descent parser is probably the simplest way to do it by hand.
Otherwise you could use a tool like Bison (since you're working in C). This tutorial is the best I've seen for working with Flex and Bison (or Lex/Yacc)
You can also search for "expression evaluator" on Codeproject - they have a lot of articles on the topic.
I came across the M4 program's expression evaluator some time ago. You can study its code to see how it works. I think this link on Google Codesearch is the version I saw.
Your question hints at requirements being put on your solution:
unfortunatly on C
so some suggestions here might not be permissible. Nevertheless, I would suggest that this is quite a complicated problem to solve, and that you would be much better off trying to find a suitable existing library which you could link into your C code to do this for you. This would likely reduce the time and effort required to get the code working, and reduce the ongoing maintenance effort. Of course, you'd have to think about licensing, but I'd be surprised if there wasn't a good parsing/evaluation library "out there" which could do a good job of this.

Resources