emacs regexp replace C function call - c

I'm trying to regexp match a C function, e.g.
func(blah blah);
The match can include newlines.
I've tried:
func([.+]);
which didn't do newlines, and:
func([...]);
func([^...]);
neither of which seemed to do anything. I guess I'm looking for the part of a regexp that will match any number/type of characters between my opening func( and );.

You could try func[[:space:]]*([^)]*). Nested parens in calls will confuse it though.

I think that the general case is not feasible with regular expressions, because the nested function calls are not a regular language.

While Maxim's answer is specific, I'm going to guess you are looking to do something with the matched function you found. To do serious code processing, you can't beat the semantic parser that is a part of CEDET's suite of tools. http://cedet.sf.net is also part of Emacs.
If you use the semantic parser in emacs, you can:
M-x semantic-mode RET
and then in code:
(semantic-fetch-tags)
or
(semantic-current-tag)
to get the current tag. Once you have the tag, you can call:
(semantic-tag-function-arguments mytag)
to get the arguments, which are tags. For one of those, use semantic-tag-name to get the name, or semantic-tag-type to get the data type.
Once you've got your tag data, you can always write out new code with SRecode, which is a code generator which will take in tags, and spit out code, such as function declarations.

Related

many wrapping functions in C

My C header file contains about 300 various functions, their names all beginning with "foo_db_" and accepting a "db_t" as their first parameter (knowing what is exactly a db_t is no really relevant here, it's just a struct).
function foo_db_my_first_function(db_t *db, char *param1, int param2);
function foo_db_my_second_function(db_t *db, double param1, const char *param2, int param3);
(...)
function foo_db_my_Nth_function(db_t *db, int param1);
My job is to write another 300 wrapping functions named "foo_XXXX" (XXXX begin the suffix of the "foo_db_" function) with a default value for the first parameter.
static __inline function foo_my_first_function(char *param1, int param2) {
foo_db_my_first_function(DEFAULT_DB, param1, param2);
}
(...)
I was wondering if I could write some macros to ease my job: declare the "db" function and the corresponding "default" function (without the first parameter).
Unfortunately, I cannot use C99 and variadic macros arguments :( so I think I'm screwed :), but I prefer to ask first here before burning my fingers to write those 300 functions :/
Assuming the original header file for the API is regular enough, then a script in your favorite text processing language (Perl, Lua, Python, Awk, or even /bin/sh in a pinch) will likely be the simplest approach.
Your script would collect all public function declarations using a regex or simple text matching to identify them (likely based on the foo_db_ prefix). It could then write two output files. First, a suitable .h file declaring your wrappers, and second the .c source file implementing them by stuffing DEFAULT_DB into their first parameter. You will need to do a minimal amount of work to copy the rest of the parameters through, but with luck the declarations are all regular enough that the text manipulation can be as simple as "rest of line" or the like.
Having done that, I would check the script into revision control, and get it invoked at build time, treating the generated files as transient build products. However, if you don't have a sufficiently flexible build system (this is why I still perfer make to nearly everything else I've seen proposed) then you will have to find a suitable kludge to signal that your generated default wrappers are out of date when the API changes.
This approach will require investing some time in the code generator script, but you should be ahead on that well before the time you imagine hand-coding your 100th wrapper. And the second time you run it....
In extreme cases, you could end up needing to implement much of the front-end of a C compiler. In that case, I see two approaches that are both more socially acceptable than arranging a meeting with the architect in a dark alley. First, there is a GCC back-end that emits its AST in XML; the resulting XML is a bear, but has been reduced down to a tree of tokens that can be manipulated. Second, there is always LPeg, a full parser that is easily used from Lua (and I suspect that there are other PEG parsers out there for other scripting languages too). Sample code for LPeg that lexes and parses C is referenced at the Wiki page.
Do it in Excel. Create a cell with "foo_db_ (db_t *db)", drag it down as many places as you need, fill in all the blanks, then copy it all into your program (you can test that the copy will work ahead of time, but I just tried with Notepad and it seems to work as intended). Now you have all your function headers, and can fill in the rest from there.

Match functions and function calls in C using regex

I am fairly new into regexes, so I wrote the following simple regex using positive lookahead that detects functions and function calls in a C source file-
\w+(?=\s*\()
It works fine, but the problem is it detects non-function syntaxes like if(), while()etc too.
I can easily avoid this by saying-
(if(?!\()) | (while(?!\())
But the problem is how to combine the second regex with the first one? I cant OR them, cos the first one still matches if(), while() etc and in an OR expression, its enough if one of the term matches.
How to combine these regexes or have a better simpler one which will not match non-function syntaxes like if(), while()
PS: I use the following tools to test my regexes
GSkinner
RegexPal
There are quite a lot of assumptions when you are searching for function call in C with regex. That aside, if you are happy with what is matched (there are valid function calls that will not be matched), and you want to exclude if and while from the result list, you can use the following regex:
(?!\b(if|while|for)\b)\b\w+(?=\s*\()
The regex uses word boundary \b to make sure that the whole name is matched (prevent partial matching of hile in while), and the whole name is not keyword (prevent rejection of whilenothinghappens).

Parsing C files without preprocessing it

I want to run simple analysis on C files (such as if you call foo macro with INT_TYPE as argument, then cast the response to int*), I do not want to prerprocess the file, I just want to parse it (so that, for instance, I'll have correct line numbers).
Ie, I want to get from
#include <a.h>
#define FOO(f)
int f() {FOO(1);}
an list of tokens like
<include_directive value="a.h"/>
<macro name="FOO"><param name="f"/><result/></macro>
<function name="f">
<return>int</return>
<body>
<macro_call name="FOO"><param>1</param></macro_call>
</body>
</function>
with no need to set include path, etc.
Is there any preexisting parser that does it? All parsers I know assume C is preprocessed. I want to have access to the macros and actual include instructions.
Our C Front End can parse code containing preprocesser elements can do this to fair extent and still build a usable AST. (Yes, the parse tree has precise file/line/column number information).
There are a number of restrictions, which allows it to handle most code. In those few cases it cannot handle, often a small, easy change to the source file giving equivalent code solves the problem.
Here's a rough set of rules and restrictions:
#includes and #defines can occur wherever a declaration or statement can occur, but not in the middle of a statement. These rarely cause a problem.
macro calls can occur where function calls occur in expressions, or can appear without semicolon in place of statements. Macro calls that span non-well-formed chunks are not handled well (anybody surprised?). The latter occur occasionally but not rarely and need manual revision. OP's example of "j(v,oid)*" is problematic, but this is really rare in code.
#if ... #endif must be wrapped around major language concepts (nonterminals) (constant, expression, statement, declaration, function) or sequences of such entities, or around certain non-well-formed but commonly occurring idioms, such as if (exp) {. Each arm of the conditional must contain the same kind of syntactic construct as the other arms. #if wrapped around random text used as bad kind of comment is problematic, but easily fixed in the source by making a real comment. Where these conditions are not met, you need to modify the original source code, often by moving the #if #elsif #else #end a few tokens.
In our experience, one can revise a code base of 50,000 lines in a few hours to get around these issues. While that seems annoying (and it is), the alternative is to not be able to parse the source code at all, which is far worse than annoying.
You also want more than just a parser. See Life After Parsing, to know what happens after you succeed in getting a parse tree. We've done some additional work in building symbol tables in which the declarations are recorded with the preprocessor context in which they are embedded, enabling type checking to include the preprocessor conditions.
You can have a look at this ANTLR grammar. You will have to add rules for preprocessor tokens, though.
Your specific example can be handled by writing your own parsing and ignore macro expansion.
Because FOO(1) itself can be interpreted as a function call.
When more cases are considered however, the parser is much more difficult. You can refer PDF Link to find more information.

Bison - additional parameter to a push and pure parser

How can I pass one aditional parameter (not the token minor of type YYSTYPE) to the yypush_parse() function?
The parser is indeed reentrant, but this one aditional variable is crucial for the thread-safety of the application I need to integrate my parser in (it's a PHP extension, so we're talking about TSRM).
I cannot just get rid of that parameter because inside the action code I'm going to call functions which will generate an AST in a userland-accessible form.
I've tried to hack around YYPUSH_DECLS and it works as far as declaring the function is concerned, BUT a few thousand LOCs down comes the implementation of yypush_parse, and I can't see any way to overwrite the function signature where the implementation of yypush_parse starts.
YYPARSE_PARAM is only used when the parser is not a push one (as far as I can tell), but in my case I NEED it be push because of the things I have to do in the processing loop, after lexing and prior to adding a new token to the parsing stack.
So I am wondering if there's a %directive or something that may help.
On the other side, I really think YYPARSE_PARAM should be used as far as it's defined, no matter what type of parser it is. It's a pity it's not.
%parse-param. YYPARSE_PARAM is deprecated and shouldn't be used.

C Macro to Override Variable Assignment with Function Call

Calling all C macro gurus...
Is there any way to write a C macro that will replace something like this:
my_var = 5;
with this:
setVar(&my_var, 5);
In other words, can I write a C macro that will override assignments for a specific variable (in the above example, my_var) and instead pass it to a function whose job it is to set that variable? If possible, I'd like to be able to hook into assignments of a specific variable.
EDIT: After thinking about this some more, I'm not sure it could be done. Even if you can come up with a macro to do it, setVar wouldn't necessarily know the type of the variable its setting, so what would be the type of its second argument?
EDIT: The reason I'd like to hook assignments of specific variables is for use in a primitive debugger for some specialized embedded C code. It would be nice to be able to have a "watch list", essentially like you have in an IDE. My first instinct was to try to hook variable assignments with a C macro so you could just drop the macro into your code and have that variable "watched", but then again I've never really written a debugger before so maybe I'm going about that all wrong.
Not with the standard preprocessor. It cannot change the parsing of the file, only replace proper names with a piece of code (and "=" isn't valid in a name).
If you're feeling adventurous, you can try to replace the executable "cpp" with a small script which pre-processes the source code. But that might wreck havoc with the debugging information (file name and, if you're replacing one line of code with several, with line number information, too). The script would call "sed"`:
sed -e 's/my_var\s*=\s*([^;]+);/MY_VAR(my_var, $1);/' file.c > file_tmp.c
But your best bet is probably to put this into a script and simply run it on all your sources. This will change the code and you'll see what is happening in your debugger.
#define setVar(_left_, _right_) *(_left_) = _right_

Resources