Suppose the user inputs an infix expression as a string?
What could be the easiest ( By easiest I mean the shortest) way to evaluate the result of that expression using C language?
Probable ways are converting it to a postfix then by using stacks.But its rather a long process.
Is there any way of using functions such as atoi() or eval() that could make the job easier?
C doesn't have an "eval" function built-in, but there are libraries that provide it.
I would highly recommend using TinyExpr. It's free and open-source C code that implements math evaluation from a string. TinyExpr is only 1 C file, and it's about 500 lines of code. I don't think you'll find a shorter or easier way that is actually complete (and not just a toy example).
Here is a complete example of using it, which should demostrate how easy it is:
#include "tinyexpr.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("%f\n", te_interp("5 * 5", 0)); //Prints 25
return 0;
}
If you want to build an expression solver yourself, I would recommend looking at the TinyExpr source-code as a starting point. It's pretty clean and easy to follow.
Certainly the most instructive way (and possibly even the easiest, once you know how) is to learn how to write your own recursive descent parser. A parser for infix expressions in C isn't very long.
Here's one of a number of excellent blog posts by Eli Bendersky on parsing. (This one is the one that's most relevant to you, but I highly recommend all of them.) It contains source code for an infix expression parser -- admittedly in Python, not C, but the conversion should be fairly straightforward, and you'll learn a lot in the process.
you need to parse the string. there's no eval() in C (as in most static languages), so you need to either write your own parser or find some library to help.
since most easy to use parsers are for C++ and not C, i'd rather use a full embeddable language. my absolute favorite is Lua, which can be incredibly lightweight if you don't include the libraries. also, the syntax is nicer than C's, so your users might like it better.
of course, Lua is a full-blown programming language, so it might not be appropriate, or maybe it could help in other ways (to make it easier to extend your application).
One clean (possible not short) way to do it is to build a tree, like a compiler would.
For example, say you have the expression "2+3". The '+' would be the head. The '2' would be the left child and the '3' would be the right child.
Since each expression evaluates to a value, this tree can be extended for infinitely complex expressions: it just needs to be sorted in order of precedence for each operator. Low precedence operators (like '+' go at the top, while high-precedence operators (like '*') go at the bottom. You would then evaluate the expressions on the tree from the bottom up.
You need to build in interpreter of some scripting language.
Convert the string into an array of tokens which are the operands and operators.
Convert the infix token array to a Reverse Polish Notation array.
After the equation is in RPN, then you can pop tokens off the stack and operate on them.
Take a look at the Wikipedia article on Reverse Polish Notation. It shows how to do the conversion and the calculation.
Related
Does a compiler use if statements when deciding what to do if a certain keyword is encounered, and should someone writing a compiler use them for most operations when checking code? Or is there a more efficient way? For example, when I test a symbol against a symbol table and it comes back as being a valid "token", do I have to use an if statement to determine what to do for every single keyword, since it seems rather inefficient, for example the pseudocode:
/*Each keyword/token in my compiler has a numerical representation which is what the symbol table returns back for example #define IF 0 and so on*/
if(Token == IF){
//This will be done to generate the AST representation for IF statements
}else if(Token == ELSE){
//This will be done to generate the AST representation of an if statement
}else if(Token == INT){
//This will be done to generate the AST represnetation of an integer
}
What kind of compilers do you mean?
If the performance matters, you may want something like callback, in this way, use the keyword as key and the callback function as the value, so the pseudo code would looks like this:
func *fp = funcTbl.get(Token);
if (fp) { fp(); }
You may try the recursive descent too. The function related to the keyword got called just where they are expected to be.
Last but not least, what you write is ok as well.
Assuming you have already split your source language from string representation to a series of lexical tokens, your next step is to use a parser to build an AST from your tokens.
The parsing stage of compilation achieves two main goals:
It checks your language for syntactic correctness, throwing an error if your input cannot be parsed according to the structure of your grammar.
It generates an AST representation of your source code
Does a compiler use if statements when deciding what to do if a
certain keyword is encountered?
No, your parser should analyse the series of lexical tokens and check them against the structure of your language's grammar.
Parsing is a well understood topic in computer science which can be approached in different ways. it cannot be trivially implemented in the example code fragment you have provided above. In a realistic programming language you need to consider that grammars can be ambiguous, and that a simple predictive parser is appropriate for all grammars and some kind of backtracking will be needed. If you do not understand this concept, I recommend you use a Parser generator for this, such as Bison.
This diagram shows a simplistic overview of the most important stages of compilation and may help you to understand its pipeline structure.
This is a process which has been refined for decades by many academics about how to best 'divide and conquer' such a mammoth task. I strongly encourage you to follow it.
For further reading, check out Modern Compiler Implementation in Java by Andrew Appel.
I'm writing an interpreter for C (subset) in Javascript (I want to provide program's execution visualisation in browser).
As the first step I want to create an AST tree for the user program. I'm using Jison for this, which is similar to flex/bizon combination.
For now I simply tokenize the program and parse to check if it conforms to the grammar given by the standard (let's leave alone the ambiguity problem introduced by typedef).
However conforming to C grammar doesn't guarantee that program makes any sense, for example
int main() {
x = ("jklfds" || "jklgfd")(2, imlost);
}
conforms to the grammar, although x is not declared, ("jklfds" || "jklgfd") isn't a function pointer - types are not checked. In general there are many contextual conditions that aren't checked.
I'm wondering how much should I check while building the AST tree. For example, in theory at this point it would be easy to fully calculate and check constant expression. However, much of other checking requires context.
Is it possible, for example, during parsing, to know that some identifiers refer to structs declared earlier in the program?
What about building the AST tree as is and checking contextual constrains by analyzing/transforming AST multiple times proving more and more conditions are correct? Will it be easier / harder than checking during parsing ?
I'm looking for the most friendly solution, I don't care for its speed.
I'm looking for the simpliest way, how to determine return type, arguments and function name from c header file written under C99.
it's my school project, which have to be written in Perl without any libs. So i got a few options, i can use the regular expression, but it's not applicable to the hardest function like folowing:
int * (* func(int * arg[]))();
the return type should be "int * (* )()" and argument is "int * []".
Second way is to use grammar and parse it, but i think, that this is not the right way.
My buddy told me about an existing algorithm which can do it. But he doesn't remember name, or where he saw him. The algorithm was quite simple. Something like: Find first end parenthesis, everything between this end parenthesis and the first-match previous start parenthesis is arguments...
Does anyone have some idea what am I looking for?
Look at the magic decoder ring for C declarations
If you can obtain The C Programming Language by Kernighan and Ritchie. Not only is it the C bible, but in chapter 5 they present code to parse C declarations. You can look there to see how they do it and quite possibly adapt their approach (chapter 5, section 12).
You simply have to build a parser for that kind of problem. Usually the top-down approach (e.g. a recursive descent) would do it for this kind of job. Fortunately top-down parsers are more or less straight forward to implement.
The only hard bit in C like languages is, that these languages are usually at least LL1 (1 token look ahead) or even worse LL2 or more. So sometimes you have to peek a few tokens in advance to find out whether it's a function declaration or a function call for example.
I am trying to overload some operators:
/* Typedef is required for operators */
typedef int Colour;
/* Operators */
Colour operator+(Colour colour1, Colour colour2);
Colour operator-(Colour colour1, Colour colour2);
Colour operator*(Colour colour1, Colour colour2);
Colour operator/(Colour colour1, Colour colour2);
I get this error for each tried overloading:
expected '=', ',', ';', 'asm' or '__attribute__' before '+' token
I can't find any good documentation on operator overloading. Googling results in C++ tutorials which use classes. In C there are no classes. Can anyone help me? Thanks.
C does not support operator overloading (beyond what it built into the language).
You need a time machine to take you back to 1985, so that you may use the program CFront. It appears that 'C' use to support operator overloading; to the sophisticated enough it still can. See Inside the C++ Object Model by Stanley B. Lippman. OMG, C++ was C! Such a thing still exists.
This answer confirms the others. 'C' by itself does not directly support overloading. However, the important point is a programmer can write code that understands code. You need a tool that transforms source to implement this. In this case, such tools already exist.
A paper, Meta-Compilation for C++, 2001 by Edward D. Willink has interesting examples of design functionality, where extending a language is useful. The combination of *nix shell script and make rules often allow such transformation. Other examples are Qt MOC, the tools Lex and Yacc, halide etc. So while 'C' itself doesn't accommodate this directly, it does if you build host tools.
In this particular example the overloading may not make sense. However, it could make a lot of sense for a program needing arbitrary precision math.
There is no operator overloading in C.
You cannot overload these operators in C.
C does not support operator overloading at all.
You can only implement operations as functions:
Colour colour_add(Colour c1, Colour c2);
Colour colour_substract(Colour c1, Colour c2);
...
You could also switch to C++, but it may be overkill to do it just for the overloading.
Operator overloading is not available in C. Instead, you will have to use a function to "pseudo-overload" the operators:
Colour add_colours(Colour c1, Colour c2) {
return c1 + c2; // or whatever you need to do
}
If you want comparable concision, the use of macros is the best available alternative:
void Type_getSomething(int id); //or some other complex series of instructions
#define g(id) Type_getSomething(id)
...it's such a pity that the use of square brackets isn't possible for macros!
Having been writing Java code for many years, I was amazed when I saw this C++ statement:
int a,b;
int c = (a=1, b=a+2, b*3);
My question is: Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
I think the compiler will see it the same as the following:
int a=1, b=a+2;
int c = b*3;
(What's the offical name for this? I assume it's a standard C/C++ syntax.)
It's the comma operator, used twice. You are correct about the result, and I don't see much point in using it that way.
Looks like an obscure use of a , (comma) operator.
It's not a representative way of doing things in C++.
The only "good-style" use for the comma operator might be in a for statement that has multiple loop variables, used something like this:
// Copy from source buffer to destination buffer until we see a zero
for (char *src = source, *dst = dest; *src != 0; ++src, ++dst) {
*dst = *src;
}
I put "good-style" in scare quotes because there is almost always a better way than to use the comma operator.
Another context where I've seen this used is with the ternary operator, when you want to have multiple side effects, e.g.,
bool didStuff = DoWeNeedToDoStuff() ? (Foo(), Bar(), Baz(), true) : false;
Again, there are better ways to express this kind of thing. These idioms are holdovers from the days when we could only see 24 lines of text on our monitors, and squeezing a lot of stuff into each line had some practical importance.
Dunno its name, but it seems to be missing from the Job Security Coding Guidelines!
Seriously: C++ allows you to a do a lot of things in many contexts, even when they are not necessarily sound. With great power comes great responsibility...
This is called 'obfuscated C'. It is legal, but intended to confuse the reader. And it seems to have worked. Unless you're trying to be obscure it's best avoided.
Hotei
Your sample code use two not very well known by beginners (but not really hidden either) features of C expressions:
the comma operator : a normal binary operator whose role is to return the last of it's two operands. If operands are expression they are evaluated from left to right.
assignment as an operator that returns a value. C assignment is not a statement as in other languages, and returns the value that has been assigned.
Most use cases of both these feature involve some form of obfuscation. But there is some legitimate ones. The point is that you can use them anywhere you can provide an expression : inside an if or a while conditional, in a for loop iteration block, in function call parameters (is using coma you must use parenthesis to avoid confusing with actual function parameters), in macro parameter, etc.
The most usual use of comma is probably in loop control, when you want to change two variables at once, or store some value before performing loop test, or loop iteration.
For example a reverse function can be written as below, thanks to comma operator:
void reverse(int * d, int len){
int i, j;
for (i = 0, j = len - 1 ; i < j ; i++, j--){
SWAP(d[i], d[j]);
}
}
Another legitimate (not obfuscated, really) use of coma operator I have in mind is a DEBUG macro I found in some project defined as:
#ifdef defined(DEBUGMODE)
#define DEBUG(x) printf x
#else
#define DEBUG(x) x
#endif
You use it like:
DEBUG(("my debug message with some value=%d\n", d));
If DEBUGMODE is on then you'll get a printf, if not the wrapper function will not be called but the expression between parenthesis is still valid C. The point is that any side effect of printing code will apply both in release code and debug code, like those introduced by:
DEBUG(("my debug message with some value=%d\n", d++));
With the above macro d will always be incremented regardless of debug or release mode.
There is probably some other rare cases where comma and assignment values are useful and code is easier to write when you use them.
I agree that assignment operator is a great source of errors because it can easily be confused with == in a conditional.
I agree that as comma is also used with a different meaning in other contexts (function calls, initialisation lists, declaration lists) it was not a very good choice for an operator. But basically it's not worse than using < and > for template parameters in C++ and it exists in C from much older days.
Its strictly coding style and won't make any difference in your program. Especially since any decent C++ compiler will optimize it to
int a=1;
int b=3;
int c=9;
The math won't even be performed during assignment at runtime. (and some of the variables may even be eliminated entirely).
As to choice of coding style, I prefer the second example. Most of the time, less nesting is better, and you won't need the extra parenthesis. Since the use of commas exhibited will be known to virtually all C++ programmers, you have some choice of style. Otherwise, I would say put each assignment on its own line.
Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
It's both a choice of coding style and it has a real benefit.
It's clearly a different coding style as compared to your equivalent example.
The benefit is that I already know I would never want to employ the person who wrote it, not as a programmer anyway.
A use case: Bob comes to me with a piece of code containing that line. I have him transferred to marketing.
You have found a hideous abuse of the comma operator written by a programmer who probably wishes that C++ had multiple assignment. It doesn't. I'm reminded of the old saw that you can write FORTRAN in any language. Evidently you can try to write Dijkstra's language of guarded commands in C++.
To answer your question, it is purely a matter of (bad) style, and the compiler doesn't careāthe compiler will generate exactly the same code as from something a C++ programmer would consider sane and sensible.
You can see this for yourself if you make two little example functions and compile both with the -S option.