Calculating input. - c

I tried to make a code that calculates the values of the formulas the user inputs. I.e , if user inputs "10+5" , the program would print "The sum is 15" etc. At first, i thought this is a easy thing to do, but if realized that just using scanf orsth wouldn't do the trick. Then i messed around with arrays and loops to see if the loop encounters "-" or "+" signs in input and then saving the character before "-" or "+" and after it and then calculating it, but i couldnt make this work either.
Could you please lead me in the right direction on how to get this done.
Thank you very much!

This can be quite complicated, especially when you get to operator precedence and you need to correctly calculate, for example, 2 + 5 * 6, which needs to be treated as 2 + (5 * 6). The correct way to approach this is to construct an expression tree (just like a compiler would). e.g.
+
/ \
2 *
/ \
5 6
You do this by creating binary tree. Each nodes holds an operation and (up to) two subnodes. Then you evaluate your expression by traversing the expression tree.

What you are trying to do is to parse arithmetic expressions, then evaluate them. There is a ton of stuff on this on the internet so, since this is your homework, I'll leave you to Google. Your first thought, that this would be easy to do, is probably a naive thought, though it's not a terribly difficult problem if you don't get too ambitious too quickly.

This might be a little over your head, but what you can do is use a grammer engine for c and a lexical analyzer.
I believe it is called "BISON" and "YYLEX"
From what I remember in school, it is how we made our pascal compiler.
http://en.wikipedia.org/wiki/GNU_bison
After creating a tree. you then can analyze sub trees and then the root node will be the sum of the sub trees.

These might be some steps that you might want to consider
get the input using getline() or fgets()
start reading the string from the beginning
do two passes, use one queue for operators and another for operands (numbers)
during the first pass, you reach an * or /, read the next number, perform the operation on the next number and the number you read before, and insert the result in a queue
also during the first pass, if you read a + or -, silently push the operator and operands into their respective queues
during the second phase handle + and -... using queues will help you properly handle successive minuses e.g. 4-3-3
These are not EXACT steps, but it's a heuristic worth looking into - try to work through these, change them according to what makes sense to you etc.

Related

Is there a known O(nm)-time/O(1)-space algorithm for POSIX filename matching (fnmatch)?

Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have proposed (and possibly solved) a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
On the plus side, this means there's a much greater chance that something like 2way/SMOA needle factorization might be applicable to these patterns, which in turn could yield the better-than-originally-desired O(n) or even O(n/m) performance.
In the question title, let m be the length of the pattern/needle and n be the length of the string being matched against it.
This question is of interest to me because all the algorithms I've seen/used have either pathologically bad performance and possible stack overflow exploits due to backtracking, or required dynamic memory allocation (e.g. for a DFA approach or just avoiding doing backtracking on the call stack) and thus have failure cases that could also be dangerous if a program is using fnmatch to grant/deny access rights of some sort.
I'm willing to believe that no such algorithm exists for regular expression matching, but the filename pattern language is much simpler than regular expressions. I've already simplified the problem to the point where one can assume the pattern does not use the * character, and in this modified problem you're not matching the whole string but searching for an occurrence of the pattern in the string (like the substring match problem). If you further simplify the language and remove the ? character, the language is just composed of concatenations of fixed strings and bracket expressions, and this can easily be matched in O(mn) time and O(1) space, which perhaps can be improved to O(n) if the needle factorization techniques used in 2way and SMOA substring search can be extended to such bracket patterns. However, naively each ? requires trials with or without the ? consuming a character, bringing in a time factor of 2^q where q is the number of ? characters in the pattern.
Anyone know if this problem has already been solved, or have ideas for solving it?
Note: In defining O(1) space, I'm using the Transdichotomous_model.
Note 2: This site has details on the 2way and SMOA algorithms I referenced: http://www-igm.univ-mlv.fr/~lecroq/string/index.html
Have you looked into the re2 regular expression engine by Russ Cox (of Google)?
It's a regular expression matching engine based on deterministic finite automata, which is different than the usual implementations (Perl, PCRE) using backtracking to simulate a non-deterministic finite automaton. One of the specific design goals was to eliminate the catastrophic backtracking behaviour you mention.
It disallows some of the Perl extensions like backreferences in the search pattern, but you don't need that for glob matching.
I'm not sure if it guarantees O(mn) time and O(1) memory constraints specifically, but it was good enough to run the Google Code Search service while it existed.
At the very least it should be cool to look inside and see how it works. Russ Cox has written three articles about re2 - one, two, three - and the re2 code is open source.
Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have solved a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
Possible solution to the harder problem follows below.
I have worked out what seems to be a solution in O(log q) space (where q is the number of question marks in the pattern, and thus q < m) and uncertain but seemingly better-than-exponential time.
First of all, a quick explanation of the problem reduction. First break the pattern at each *; it decomposes as a (possibly zero length) initial and final component, and a number of internal components flanked on both sided by a *. This means once we've determined if the initial/final components match up, we can apply the following algorithm for internal matches: Starting with the last component, search for the match in the string that starts at the latest offset. This leaves the most possible "haystack" characters free to match earlier components; if they're not all needed, it's no problem, because the fact that a * intervenes allows us to later throw away as many as needed, so it's not beneficial to try "using more ? marks" of the last component or finding an earlier occurrence of it. This procedure can then be repeated for every component. Note that here I'm strongly taking advantage of the fact that the only "repetition operator" in the fnmatch expression is the * that matches zero or more occurrences of any character. The same reduction would not work with regular expressions.
With that out of the way, I began looking for how to match a single component efficiently. I'm allowing a time factor of n, so that means it's okay to start trying at every possible position in the string, and give up and move to the next position if we fail. This is the general procedure we'll take (no Boyer-Moore-like tricks yet; perhaps they can be brought in later).
For a given component (which contains no *, only literal characters, brackets that match exactly one character from a given set, and ?), it has a minimum and maximum length string it could match. The minimum is the length if you omit all ? characters and count bracket expressions as one character, and the maximum is the length if you include ? characters. At each position, we will try each possible length the pattern component could match. This means we perform q+1 trials. For the following explanation, assume the length remains fixed (it's the outermost loop, outside the recursion that's about to be introduced). This also fixes a length (in characters) from the string that we will be comparing to the pattern at this point.
Now here's the fun part. I don't want to iterate over all possible combinations of which ? characters do/don't get used. The iterator is too big to store. So I cheat. I break the pattern component into two "halves", L and R, where each contains half of the ? characters. Then I simply iterate over all the possibilities of how many ? characters are used in L (from 0 to the total number that will be used based on the length that was fixed above) and then the number of ? characters used in R is determined as well. This also partitions the string we're trying to match into part that will be matched against pattern L and pattern R.
Now we've reduced the problem of checking if a pattern component with q ? characters matches a particular fixed-length string to two instances of checking if a pattern component with q/2 ? characters matches a particular smaller fixed-length string. Apply recursion. And since each step halves the number of ? characters involved, the number of levels of recursion is bounded by log q.
You can create a hash of both strings and then compare these. The hash computation will be done in O(m) while the search in O(m + n)
You can use something like this for calculating the hash of the string where s[i] is a character
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
As you said this is for file-name matching and you can't use this where you have wildcards in the strings. Good luck!
My feeling is that this is not possible.
Though I can't provide a bullet-proof argument, my intuition is that you will always be able to construct patterns containing q=Theta(m) ? characters where it will be necessary for the algorithm to, in some sense, account for all 2^q possibilities. This will then require O(q)=O(m) space to keep track of which of the possibilities you're currently looking at. For example, the NFA algorithm uses this space to keep track of the set of states it's currently in; the brute-force backtracking approach uses the space as stack (and to add insult to injury, it uses O(2^q) time in addition to the O(q) of space).
OK, here's how I solved the problem.
Attempt to match the initial part of the pattern up to the first * against the string. If this fails, bail out. If it succeeds, throw away this initial part of both the pattern and the string; we're done with them. (And if we hit the end of pattern before hitting a *, we have a match iff we also reached the end of the string.)
Skip all the way to end end of the pattern (everything after the last *, which might be a zero-length pattern if the pattern ends with a *). Count the number of characters needed to match it, and examine that many characters from the end of the string. If they fail to match, we're done. If they match, throw away this component of the pattern and string.
Now, we're left with a (possibly empty) sequence of subpatterns, all of which are flanked on both sides by *'s. We try searching for them sequentially in what remains of the string, taking the first match for each and discarding the beginning of the string up through the match. If we find a match for each component in this manner, we have a match for the whole pattern. If any component search fails, the whole pattern fails to match.
This alogorithm has no recursion and only stores a finite number of offsets in the string/pattern, so in the transdichotomous model it's O(1) space. Step 1 was O(m) in time, step 2 was O(n+m) in time (or O(m) if we assume the input string length is already known, but I'm assuming a C string), and step 3 is (using a naive search algorithm) O(nm). Thus the algorithm overall is O(nm) in time. It may be possible to improve step 3 to be O(n) but I haven't yet tried.
Finally, note that the original harder problem is perhaps still useful to solve. That's because I didn't account for multi-character collating elements, which most people implementing regex and such tend to ignore because they're ugly to get right and there's no standard API to interface with the system locale and obtain the necessary info to get them. But with that said, here's an example: Suppose ch is a multi-character collating element. Then [c[.ch.]] could consume either 1 or 2 characters. And we're back to needing the more advanced algorithm I described in my original answer, which I think needs O(log m) space and perhaps somewhat more than O(nm) time (I'm guessing O(n²m) at best). At the moment I have no interest in implementing multi-character collating element support, but it does leave a nice open problem...

How to read a line a seperate (+, -) from numbers in C

I'm trying to write a program that will read a line like:
* 3 2
then call up a function multiply that I have written that will do the equation.
I've got the formulas for all the operations, just don't know how to seperate the line and make the calls.
Polish (prefix) notation is when an operator is written before its operands, like you have here. If each of your lines corresponds to a "Polish sentence", and if you're able to conveniently store an entire line in memory (very likely, I'd guess), then the simplest way might be to just read the entire line into a buffer (i.e., with fgets()), then use strtok() to build an array of pointers to each token, and finally traverse that array from its end to its start (basically reading the sentence from right to left), processing each token in a manner similar to the pseudocode given in the aforementioned Wikipedia entry. (The left-to-right processing of a reverse Polish notation sentence is exactly the same and arguably a bit easier to understand, simply because one doesn't have to traverse the sentence backwards to do it so easily.)

Parsing a stream of data for control strings

I feel like this is a pretty common problem but I wasn't really sure what to search for.
I have a large file (so I don't want to load it all into memory) that I need to parse control strings out of and then stream that data to another computer. I'm currently reading in the file in 1000 byte chunks.
So for example if I have a string that contains ASCII codes escaped with ('$' some number of digits ';') and the data looked like this... "quick $33;brown $126;fox $a $12a". The string going to the other computer would be "quick brown! ~fox $a $12a".
In my current approach I have the following problems:
What happens when the control strings falls on a buffer boundary?
If the string is '$' followed by anything but digits and a ';' I want to ignore it. So I need to read ahead until the full control string is found.
I'm writing this in straight C so I don't have streams to help me.
Would an alternating double buffer approach work and if so how does one manage the current locations etc.
If I've followed what you are asking about it is called lexical analysis or tokenization or regular expressions. For regular languages you can construct a finite state machine which will recognize your input. In practice you can use a tool that understands regular expressions to recognize and perform different actions for the input.
Depending on different requirements you might go about this differently. For more complicated languages you might want to use a tool like lex to help you generate an input processor, but for this, as I understand it, you can use a much more simple approach, after we fix your buffer problem.
You should use a circular buffer for your input, so that indexing off the end wraps around to the front again. Whenever half of the data that the buffer can hold has been processed you should do another read to refill that. Your buffer size should be at least twice as large as the largest "word" you need to recognize. The indexing into this buffer will use the modulus (remainder) operator % to perform the wrapping (if you choose a buffer size that is a power of 2, such as 4096, then you can use bitwise & instead).
Now you just look at the characters until you read a $, output what you've looked at up until that point, and then knowing that you are in a different state because you saw a $ you look at more characters until you see another character that ends the current state (the ;) and perform some other action on the data that you had read in. How to handle the case where the $ is seen without a well formatted number followed by an ; wasn't entirely clear in your question -- what to do if there are a million numbers before you see ;, for instance.
The regular expressions would be:
[^$]
Any non-dollar sign character. This could be augmented with a closure ([^$]* or [^$]+) to recognize a string of non$ characters at a time, but that could get very long.
$[0-9]{1,3};
This would recognize a dollar sign followed by up 1 to 3 digits followed by a semicolon.
[$]
This would recognize just a dollar sign. It is in the brackets because $ is special in many regular expression representations when it is at the end of a symbol (which it is in this case) and means "match only if at the end of line".
Anyway, in this case it would recognize a dollar sign in the case where it is not recognized by the other, longer, pattern that recognizes dollar signs.
In lex you might have
[^$]{1,1024} { write_string(yytext); }
$[0-9]{1,3}; { write_char(atoi(yytext)); }
[$] { write_char(*yytext); }
and it would generate a .c file that will function as a filter similar to what you are asking for. You will need to read up a little more on how to use lex though.
The "f" family of functions in <stdio.h> can take care of the streaming for you. Specifically, you're looking for fopen(), fgets(), fread(), etc.
Nategoose's answer about using lex (and I'll add yacc, depending on the complexity of your input) is also worth considering. They generate lexers and parsers that work, and after you've used them you'll never write one by hand again.

To find all possible permutations of a given string

This is just to work out a problem which looks pretty interesting. I tried to think over it, but couldn't find the way to solve this, in efficient time. May be my concepts are still building up... anyways the question is as follows..
Wanted to find out all possible permutation of a given string....... Also, share if there could be any possible variations to this problem.
I found out a solution on net, that uses recursion.. but that doesn't satisfies as it looks bit erroneous.
the program is as follows:-
void permute(char s[], int d)
{
int i;
if(d == strlen(s))
printf("%s",s);
else
{
for(i=d;i<strlen(s);i++)
{
swap(s[d],s[i]);
permute(s,d+1);
swap(s[d],s[i]);
}
}
}
If this program looks good (it is giving error when i ran it), then please provide a small example to understand this, as i am still developing recursion concepts..
Any other efficient algorithm, if exists, can also be discussed....
And Please,, this is not a HW........
Thanks.............
The code looks correct, though you only have the core of the algorithm, not a complete program. You'll have to provide the missing bits: headers, a main function, and a swap macro (you could make swap a function by calling it as swap(s, d, i)).
To understand the algorithm, it would be instructive to add some tracing output, say printf("permute(%s, %d)", s, d) at the beginning of the permute function, and run the program with a 3- or 4-character string.
The basic principle is that each recursive call to permute successively places each remaining element at position d; the element that was at position d is saved by putting it where the aforementioned remaining element was (i.e. the elements are swapped). For each placement, permute is called recursively to generate all desired substrings after the position d. So the top-level call (d=0) to permute successively tries all elements in position 0, second-level calls (d=1) try all elements in position 1 except for the one that's already in position 0, etc. The next-to-deepest calls (d=n-1) have a single element to try in the last position, and the deepest calls (d=n) print the resulting permutation.
The core algorithm requires Θ(n·n!) running time, which is the best possible since that's the size of the output. However this implementation is less efficient that it could be because it recomputes strlen(s) at every iteration, for a Θ(n²·n!) running time; the simple fix of precomputing the length would yield Θ(n·n!). The implementation requires Θ(n) memory, which is the best possible since that's the size of the input.
For an explanation of the recursion see Gilles answer.
Your code has some problems. First it will be hard to implement the required swap as a function in C, since C lacks the concept of call by reference. You could try to do this with a macro, but then you'd either have to use the exclusive-or trick to swap values in place, or use a temporary variable.
Then your repeated use of strlen on every recursion level blows up your complexity of the program. As you give it this is done at every iteration of every recursion level. Since your string even changes (because of the swaps) the compiler wouldn't even be able to notice that this is always the same. So he wouldn't be able to optimize anything. Searching for the terminating '\0' in your string would dominate all other instructions by far if you implement it like that.

parsing of mathematical expressions

(in c90) (linux)
input:
sqrt(2 - sin(3*A/B)^2.5) + 0.5*(C*~(D) + 3.11 +B)
a
b /*there are values for a,b,c,d */
c
d
input:
cos(2 - asin(3*A/B)^2.5) +cos(0.5*(C*~(D)) + 3.11 +B)
a
b /*there are values for a,b,c,d */
c
d
input:
sqrt(2 - sin(3*A/B)^2.5)/(0.5*(C*~(D)) + sin(3.11) +ln(B))
/*max lenght of formula is 250 characters*/
a
b /*there are values for a,b,c,d */
c /*each variable with set of floating numbers*/
d
As you can see infix formula in the input depends on user.
My program will take a formula and n-tuples value.
Then it calculate the results for each value of a,b,c and d.
If you wonder I am saying ;outcome of program is graph.
/sometimes,I think i will take input and store in string.
then another idea is arise " I should store formula in the struct"
but ı don't know how I can construct
the code on the base of structure./
really, I don't know way how to store the formula in program code so that
I can do my job.
can you show me?
/* a,b,c,d is letters
cos,sin,sqrt,ln is function*/
You need to write a lexical analyzer to tokenize the input (break it into its component parts--operators, punctuators, identifiers, etc.). Inevitably, you'll end up with some sequence of tokens.
After that, there are a number of ways to evaluate the input. One of the easiest ways to do this is to convert the expression to postfix using the shunting yard algorithm (evaluation of a postfix expression is Easy with a capital E).
You should look up "abstract syntax trees" and "expression trees" as well as "lexical analysis", "syntax", "parse", and "compiler theory". Reading text input and getting meaning from it is quite difficult for most things (though we often try to make sure we have simple input).
The first step in generating a parser is to write down the grammar for your input language. In this case your input language is some Mathematical expressions, so you would do something like:
expr => <function_identifier> ( stmt )
( stmt )
<variable_identifier>
<numerical_constant>
stmt => expr <operator> stmt
(I haven't written a grammar like this {look up BNF and EBNF} in a few years so I've probably made some glaring errors that someone else will kindly point out)
This can get a lot more complicated depending on how you handle operator precedence (multiply and device before add and subtract type stuff), but the point of the grammar in this case is to help you to write a parser.
There are tools that will help you do this (yacc, bison, antlr, and others) but you can do it by hand as well. There are many many ways to go about doing this, but they all have one thing in common -- a stack. Processing a language such as this requires something called a push down automaton, which is just a fancy way of saying something that can make decisions based on new input, a current state, and the top item of the stack. The decisions that it can make include pushing, popping, changing state, and combining (turning 2+3 into 5 is a form of combining). Combining is usually referred to as a production because it produces a result.
Of the various common types of parsers you will almost certainly start out with a recursive decent parser. They are usually written directly in a general purpose programming language, such as C. This type of parser is made up of several (often many) functions that call each other, and they end up using the system stack as the push down automaton stack.
Another thing you will need to do is to write down the different types of words and operators that make up your language. These words and operators are called lexemes and represent the tokens of your language. I represented these tokens in the grammar <like_this>, except for the parenthesis which represented themselves.
You will most likely want to describe your lexemes with a set of regular expressions. You should be familiar with these if you use grep, sed, awk, or perl. They are a way of describing what is known as a regular language which can be processed by something known as a Finite State Automaton. That is just a fancy way of saying that it is a program that can make a decision about changing state by considering only its current state and the next input (the next character of input). For example part of your lexical description might be:
[A-Z] variable-identifier
sqrt function-identifier
log function-identifier
[0-9]+ unsigned-literal
+ operator
- operator
There are also tools which can generate code for this. lex which is one of these is highly integrated with the parser generating program yacc, but since you are trying to learn you can also write your own tokenizer/lexical analysis code in C.
After you have done all of this (it will probably take you quite a while) you will need to have your parser build a tree to represent the expressions and grammar of the input. In the simple case of expression evaluation (like writing a simple command line calculator program) you could have your parser evaluate the formula as it processed the input, but for your case, as I understand it, you will need to make a tree (or Reverse Polish representation, but trees are easier in my opinion).
Then after you have read the values for the variables you can traverse the tree and calculate an actual number.
Possibly the easiest thing to do is use an embedded language like Lua or Python, for both of which the interpreter is written in C. Unfortunately, if you go the Lua route you'll have to convert the binary operations to function calls, in which case it's likely easier to use Python. So I'll go down that path.
If you just want to output the result to the console this is really easy and you won't even have to delve too deep in Python embedding. Since, then you only have to write a single line program in Python to output the value.
Here is the Python code you could use:
exec "import math;A=<vala>;B=<valb>;C=<valc>;D=<vald>;print <formula>".replace("^", "**").replace("log","math.log").replace("ln", "math.log").replace("sin","math.sin").replace("sqrt", "math.sqrt").replace("cos","math.cos")
Note the replaces are done in Python, since I'm quite sure it's easier to do this in Python and not C. Also note, that if you want to use xor('^') you'll have to remove .replace("^","**") and use ** for powering.
I don't know enough C to be able to tell you how to generate this string in C, but after you have, you can use the following program to run it:
#include <Python.h>
int main(int argc, char* argv[])
{
char* progstr = "...";
Py_Initialize();
PyRun_SimpleString(progstr);
Py_Finalize();
return 0;
}
You can look up more information about embedding Python in C here: Python Extension and Embedding Documentation
If you need to use the result of the calculation in your program there are ways to read this value from Python, but you'll have to read up on them yourself.
Also, you should review your posts to SO and other posts regarding Binary Trees. Implement this using a tree structure. Traverse as infix to evaluate. There have been some excellent answers to tree questions.
If you need to store this (for persistance as in a file), I suggest XML. Parsing XML should make you really appreciate how easy your assignment is.
Check out this post:
http://blog.barvinograd.com/2011/03/online-function-grapher-formula-parser-part-2/
It uses ANTLR library for parsing math expression, this one specifically uses JavaScript output but ANTLR has many outputs such as Java, Ruby, C++, C# and you should be able to use the grammar in the post for any output language.

Resources