Bison passing back resulting AST - c

In lemon I was able to use the third parameter of the parsing function to pass back the result to the caller when the starting symbol was reduced.
How would I do the same in bison? Is it enough to assign that value to $$ within the starting symbol's action code, and from the caller to take it as the "yy minor" value, after the final call to yypush_parse()?
The parser is push and pure. Thread-safety is a must.

You'll pretty much have to do-it-yourself with bison/yacc if you want an AST, by creating your own nodes and assigning them to $$.
The example at http://epaperpress.com/lexandyacc/ (look at the .y file in Calculator->Yacc input) or http://www.progtools.org/compilers/tutorials/cxx_and_bison/cxx_and_bison.html might give you ideas on how to do that.

Related

How to persist a function pointer in C?

Suppose that I have a function pointer which can be invoked to do some tasks. How can I store the piece of code, to which the pointer is pointing, to a file on disk so I can later load the file and have the function pointer available again?
Use case: This will be done inside a JIT compiler to prevent the future overhead of JIT-ing in the next run of the same program.
Edit: The answer to "Save and load function pointers to file" are not what I am looking for. That question is dealing with a limited number of functions to which people have suggested using indices. But in my case, the function can be anything with any content.
There is no portable way to do that. Your options are:
Associate pointers with symbolic names of your choosing, e.g. using a global mapping table, and serialize the function name. On deserialization, look up the actual pointer in the mapping.
Serialize real function names, possibly also contained with a mapping. On deserialization, use dlsym (or equivalent on non-Unix platforms, such as GetProcAddress) to get the function pointer.

many wrapping functions in C

My C header file contains about 300 various functions, their names all beginning with "foo_db_" and accepting a "db_t" as their first parameter (knowing what is exactly a db_t is no really relevant here, it's just a struct).
function foo_db_my_first_function(db_t *db, char *param1, int param2);
function foo_db_my_second_function(db_t *db, double param1, const char *param2, int param3);
(...)
function foo_db_my_Nth_function(db_t *db, int param1);
My job is to write another 300 wrapping functions named "foo_XXXX" (XXXX begin the suffix of the "foo_db_" function) with a default value for the first parameter.
static __inline function foo_my_first_function(char *param1, int param2) {
foo_db_my_first_function(DEFAULT_DB, param1, param2);
}
(...)
I was wondering if I could write some macros to ease my job: declare the "db" function and the corresponding "default" function (without the first parameter).
Unfortunately, I cannot use C99 and variadic macros arguments :( so I think I'm screwed :), but I prefer to ask first here before burning my fingers to write those 300 functions :/
Assuming the original header file for the API is regular enough, then a script in your favorite text processing language (Perl, Lua, Python, Awk, or even /bin/sh in a pinch) will likely be the simplest approach.
Your script would collect all public function declarations using a regex or simple text matching to identify them (likely based on the foo_db_ prefix). It could then write two output files. First, a suitable .h file declaring your wrappers, and second the .c source file implementing them by stuffing DEFAULT_DB into their first parameter. You will need to do a minimal amount of work to copy the rest of the parameters through, but with luck the declarations are all regular enough that the text manipulation can be as simple as "rest of line" or the like.
Having done that, I would check the script into revision control, and get it invoked at build time, treating the generated files as transient build products. However, if you don't have a sufficiently flexible build system (this is why I still perfer make to nearly everything else I've seen proposed) then you will have to find a suitable kludge to signal that your generated default wrappers are out of date when the API changes.
This approach will require investing some time in the code generator script, but you should be ahead on that well before the time you imagine hand-coding your 100th wrapper. And the second time you run it....
In extreme cases, you could end up needing to implement much of the front-end of a C compiler. In that case, I see two approaches that are both more socially acceptable than arranging a meeting with the architect in a dark alley. First, there is a GCC back-end that emits its AST in XML; the resulting XML is a bear, but has been reduced down to a tree of tokens that can be manipulated. Second, there is always LPeg, a full parser that is easily used from Lua (and I suspect that there are other PEG parsers out there for other scripting languages too). Sample code for LPeg that lexes and parses C is referenced at the Wiki page.
Do it in Excel. Create a cell with "foo_db_ (db_t *db)", drag it down as many places as you need, fill in all the blanks, then copy it all into your program (you can test that the copy will work ahead of time, but I just tried with Notepad and it seems to work as intended). Now you have all your function headers, and can fill in the rest from there.

Bison - additional parameter to a push and pure parser

How can I pass one aditional parameter (not the token minor of type YYSTYPE) to the yypush_parse() function?
The parser is indeed reentrant, but this one aditional variable is crucial for the thread-safety of the application I need to integrate my parser in (it's a PHP extension, so we're talking about TSRM).
I cannot just get rid of that parameter because inside the action code I'm going to call functions which will generate an AST in a userland-accessible form.
I've tried to hack around YYPUSH_DECLS and it works as far as declaring the function is concerned, BUT a few thousand LOCs down comes the implementation of yypush_parse, and I can't see any way to overwrite the function signature where the implementation of yypush_parse starts.
YYPARSE_PARAM is only used when the parser is not a push one (as far as I can tell), but in my case I NEED it be push because of the things I have to do in the processing loop, after lexing and prior to adding a new token to the parsing stack.
So I am wondering if there's a %directive or something that may help.
On the other side, I really think YYPARSE_PARAM should be used as far as it's defined, no matter what type of parser it is. It's a pity it's not.
%parse-param. YYPARSE_PARAM is deprecated and shouldn't be used.

Getting function argument types

Suppose I have a call to a function which takes a variable number of arguments in my source code. I want to do some kind of static analysis on this source code to find the type of the arguments being actually passed to the function. For example, if my function call is -
foo(a, b, c)
I want to find the data type of a, b and c and store this information.
You pretty well have to do the parse-and-build-a-symbol-table part of compiling the program.
Which means running the preprocessor, and lexing as well.
That's the bad news.
The good news is that you don't have to do much of the hard stuff. No need to build a AST, every part of the code except typedefs; struct, union, and enum definitions; variable-or-function declarations-and-definitions; and analyzing the function call arguments can be a no-op.
On further thought prompted by Chris' comments: You do have to be able to analyze the types of expressions and handle the va-arg promotions, as well.
It is still a smaller project than writing a whole compiler, but should be approached with some thought.
If this is in C++, you can hack together some RTTI using typeid etc.
http://en.wikipedia.org/wiki/Run-time_type_information

Inspect Bison's $$ variable with GDB

If I set a breakpoint in a Bison .y file, is there a way I can inspect the contents of $$ pseudo variable at that breakpoint?
$$ is be the top of the semantic value stack. It may be a little difficult to interpret. If you really need to, the stack pointer might be called yyssp and the stack might be called yyvsa, so something like yyvsa[yyssp] might give you what you want, depending on the version of bison you're using. Look at .tab.c code that was generated.
Bison keeps the stacks as local variables in yyparse(), dynamically allocated.
Probably the easiest way to solve a temporary debugging issue is to patch y.tab.c so that the line *++yyvsp = yylval also drops a copy in a global. You may also want to hack YYPOPSTACK() to do the same thing.
I redefined the type of yylval with %union:
%union {
int int_val;
double double_val;
}
And what I get is either yyval.int_val or yyval.double_val depending on the type of $$.
But just as Richard Pennington said, the best way would be to look at the generated .tab.c code.

Resources