Regex to detect begining of the c function body - c

I working on a perl script that prints the required function body from the c source file. i have written a regex to get to the start of the function body as
(/(void|int)\s*($function_name)\s*\(.*?\)\s*{/s
but this works only for functions returning void or int(basic types)
how can i change this regex to handle user defined datatypes (struct or pointers)

Try this one (untested!), although it does expect the function to start at the beginning of a line :
/
^ # Start of line
\s*(?:struct\s+)[a-z0-9_]+ # return type
\s*\** # return type can be a pointer
\s*([a-z0-9_]+) # Function name
\s*\( # Opening parenthesis
(
(?:struct\s+) # Maybe we accept a struct?
\s*[a-z0-9_]+\** # Argument type
\s*(?:[a-z0-9_]+) # Argument name
\s*,? # Comma to separate the arguments
)*
\s*\) # Closing parenthesis
\s*{? # Maybe a {
\s*$ # End of the line
/mi # Close our regex and mark as case insensitive
You can squeeze all of these into a single line by removing the whitespace and comments.
Parsing code with a regex is generally hard though, and this regex is not perfect at all.

Related

How to portability use "${#:2}"?

On Allow for ${#:2} syntax in variable assignment they say I should not use "${#:2}" because it breaks things across different shells, and I should use "${*:2}" instead.
But using "${*:2}" instead of "${#:2}" is nonsense because doing "${#:2}" is not equivalent to "${*:2}" as the following example:
#!/bin/bash
check_args() {
echo "\$#=$#"
local counter=0
for var in "$#"
do
counter=$((counter+1));
printf "$counter. '$var', ";
done
printf "\\n\\n"
}
# setting arguments
set -- "space1 notspace" "space2 notspace" "lastargument"; counter=1
echo $counter': ---------------- "$*"'; counter=$((counter+1))
check_args "$*"
echo $counter': ---------------- "${*:2}"'; counter=$((counter+1))
check_args "${*:2}"
echo $counter': ---------------- "${#:2}"'; counter=$((counter+1))
check_args "${#:2}"
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
1: ---------------- "$*"
$#=1
1. 'space1 notspace space2 notspace lastargument',
2: ---------------- "${*:2}"
$#=1
1. 'space2 notspace lastargument',
3: ---------------- "${#:2}"
$#=2
1. 'space2 notspace', 2. 'lastargument',
If I cannot use "${#:2}" (as they say), what is the equivalent can I use instead?
This is original question Process all arguments except the first one (in a bash script) and their only answer to keep arguments with spaces together is to use "${#:2}"
There's context that's not clear in the question unless you follow the links. It's concerning the following recommendation from shellcheck.net:
local _help_text="${#:2}"
^––SC2124 Assigning an array to a string! Assign as array, or use * instead of # to concatenate.
Short answer: Don't assign lists of things (like arguments) to plain variables, use an array instead.
Long answer: Generally, "${#:2}" will get all but the first argument, with each treated as a separate item ("word"). "${*:2}", on the other hand, produces a single item consisting of all but the first argument stuck together, separated by a space (or whatever the first character of $IFS is).
But in the specific case where you're assigning to a plain variable, the variable is only capable of storing a single item, so var="${#:2}" also collapses the arguments down to a single item, but it does it in a less consistent way than "${*:2}". In order to avoid this, use something that is capable of storing multiple items: an array. So:
Really bad: var="${#:2}"
Slightly less bad: var="${*:2}"
Much better: arrayvar=("${#:2}") (the parentheses make this an array)
Note: to get the elements of the array back, with each one treated properly as a separate item, use "${arrayvar[#]}". Also, arrays are not supported by all shells (notably, dash doesn't support them), so if you use them you should be sure to use a bash shebang (#!/bin/bash or #!/usr/bin/env bash). If you really need portability to other shells, things get much more complicated.
Neither ${#:2} nor ${*:2} is portable, and many shells will reject both as invalid syntax. If you want to process all arguments except the first, you should get rid of the first with a shift.
first="${1}"
shift
echo The arguments after the first are:
for x; do echo "$x"; done
At this point, the first argument is in "$first" and the positional parameters are shifted down one.
This demonstrates how to combine all ${#} arguments into a single variable one without the hack ${#:1} or ${#:2} (live example):
#!/bin/bash
function get_all_arguments_as_single_one_unquoted() {
single_argument="$(printf "%s " "${#}")";
printf "unquoted arguments %s: '%s'\\n" "${#}" "${single_argument}";
}
function get_all_arguments_as_single_one_quoted() {
single_argument="${1}";
printf "quoted arguments %s: '%s'\\n" "${#}" "${single_argument}";
}
function escape_arguments() {
escaped_arguments="$(printf '%q ' "${#}")";
get_all_arguments_as_single_one_quoted "${escaped_arguments}";
get_all_arguments_as_single_one_unquoted ${escaped_arguments};
}
set -- "first argument" "last argument";
escape_arguments "${#}";
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
quoted arguments 1: 'first\ argument last\ argument '
unquoted arguments 4: 'first\ argument last\ argument '
As #William Pursell answer points out, if you would like to get only {#:2} arguments, you can add a shift call before "${#}"
function escape_arguments() {
shift;
escaped_arguments="$(printf '%q ' "${#}")";
get_all_arguments_as_single_one_quoted "${escaped_arguments}";
get_all_arguments_as_single_one_unquoted ${escaped_arguments};
}
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
quoted arguments 1: 'last\ argument '
unquoted arguments 2: 'last\ argument '

How to insert lines of text after any C function begin and before end of function?

I have hundred of C-functions like
void test()
{
<content of function>
}
(Functions may have a return value UBYTE, BOOL, WORD, ...)
Now I would like to add a text to all functions as follows:
void test()
{
LABEEL_BEGIN
<blank line>
<content of function>
<blank line>
LABEL_END
}
So I need to insert LABEL_BEGIN and a blank line at the start of the function and a blank line and LABEL_END at end of the function.
I assume that this might be possible with some ticky regex!? Or is there a Text Editor available which has such a feature? Currently, I have MS Studio 2013 IDE, Notapad++, Textpad, PSpad available and also the GNU grep 2.4.5 command line tool.
The following regex will select the each void, its name, any parameters and its first opening curly brace. If you match this against your input, you can take it's match index and match length to find the beginning index of the void's inner content. You can then add you're additional content at the index.
(void [\w_][\w\d_]*\(.*\)(\r|)(\n|){)
Feel free to play around with it here: https://regexr.com/3jarc
To get the end of the function, you could use the regex above as a positive lookbehind then match the closing '}'.
What about a brace counter ?
you open your source code file, and you read char by char.
Each time you have '{', you increment the brace_counter, and each time you have '}', you decremente it.
If you found a '{' and the counter is 0 before incrementation, you have the beginning of your function.
If you found a '}' and the counteur is 0 after decrementation, you have the end of your function.
Could this work ? Or there are tricky C syntax that can broke that down ?

Finding "main" functions' names in a C file via Bash script

I'm having a large number of C files, that are structured with the following principle:
All functions are declared in the C file and are with return type int, double or void.
All functions start with "ksz_". Only functions use this - nothing else uses "ksz_" in their names.
The file contains "main" functions. All supporting functions use their "main" function's name to form themselves.
Because they were made by different people they are quite messly made and have spaces placed at random places:
A rought visualization would be(note the spaces):
int ksz_Print(...)
{
...
}
void ksz_Print_Helper1 (... ){
...
}
void ksz_Print_Helper2(...) {
...
}
int ksz_Input(...){
...
}
double ksz_Input_Helper1 ( ...){
...
}
I need to find the "main" function names of each individual C file in order to use them for another seach algorithm.
Since these files are huge(sme of them have over a dozen thousand lines) and there are hundreds of them - I need a Bash scrip for this.
Ideally this script would extract only the "main" functions:
ksz_Print
ksz_Input
What stops me is that i can't think the Regex of my grep in order to extract the function lines. I think its logic should look like this:
(spaces)(int/float/double)(spaces)(ksz_)(other characers without space)(spaces)(open bracket)
After that I guess I'll extract the word containing "ksz_" from each line with cut(after trimming and removing duplicate spaces).
And last I'll need to find a way to filter out the supporting functions.
But what would be my initial grep in this script?
If I understand your specifications correctly this should do it:
root#local [~]# awk '/^[ \t]*(int|float|double)[ \t]+ksz_/ {print $2}' sample.txt
One thing I did not understand was whether there should only be one "_" after ksz so for example if "double ksz_Input_Helper1" is not something you want to match. In the regex above it does match.
I also chose to go with awk rather than grep as you said you want only the name the above awk prints only the second field using whitespace as a delimiter. If you still want to use grep this one does the same task:
root#local [~]# egrep '^\s*(int|float|double)\s+ksz_' sample.txt
Here is a breakdown(note in awk I use [ \t] in place of \s as I could not get it to recognize \s]:
^ - match start of line
\s* - match if there are 0 or more white spaces
(int|float|double) - match int, float, OR double
\s+ - match at least one whitespace
ksz_ - match literal string "ksz_"
Try using a regex that only matches the portion you want and only print that:
grep -oRE "(ksz_[a-zA-Z_]*\b)" *
-o - output only match
-R - recursive
-E - regex
[a-zA-Z_] - upper and lower case letters, underscore
\b - ending at word boundry

Using sed, How to Insert a line at the beginning of a C function - closing paren, newline, opening curly brace

I want to insert a line at the beginning of several C functions that are formatted the same. I suspect sed is the way to do this but I have limited sed knowledge. Thanks.
void func (any arbitrary list of parameters)
{
void func (any arbitrary list of parameters)
{
myNewInsertedLineHere
If the opening braces for functions begin on the first column and if they are the only braces that are in the first column (i.e. if you place opening braces for structs and enums at the end of a line), you can use:
sed -e 's/^{/{\n MYNEWLINE;/g' orig.c > edited.c
This seems to work in a quick test, but usual warnings and disclaimers apply.
Edit: As pointed out in the comments, not only functions have curly braces in the first column, so some context is needed. We can use another tool from the 70s, awk:
awk 'BEGIN {split("typedef union struct enum", a); \
for (i in a) skip[a[i]] = 1;}; \
{print; if (/^{/ && !(last in skip)) print " MYFIRSTLINE();"; \
if (NF > 0) last = $1; }' orig.c > edited.c
That's a one-liner in theory, but it might be better in a separate file, say first.awk:
#!/usr/bin/awk -f
BEGIN {
split("typedef union struct enum", a);
for (i in a) skip[a[i]] = 1;
};
{
print;
if (/^{/ && !(last in skip))
print " MYFIRSTLINE();";
if (NF > 0) last = $1;
}
Then you can call the script with
awk -f first.awk orig.c > edited.c
or, after chmodding executing permissions as
first.awk orig.c > edited.c
Of course, the same strategy:
print every line;
when there is a brace in the first column and the context isn't a type or variable definition, print the additional content;
save the first word to determine the context for the next line
can be implemented in any other scripting language, too.
A program is almost ever too complex for such simple rules. You said in title closing paren, newline, opening curly brace. What do you want to do with:
if testfunction(val)
{
It follows the criteria but is not a function definition.
That being said, the following sedscript should do the trick, it even cares for optional tabs or spaces around the {
/)[ \t\r]*$/ {
n
s/^[ \t]*}[ \t\r]*$/&/
t add
b end
:add
a\
\
end
}
In english, it reads:
look for a line ending with right paren
look at next line
try to replace a line containing only an opening curly brace (apart from white spaces) by itself
if substitution matched add an empty line

How to create a TCL function with optional arguments using SWIG?

I have a simple c/c++ app that has an optional TCL interpreter with the function wrappers generated using SWIG. For several of the functions all the arguments are optional. How is this typically handled? I'd like to support a TCL command like this, where any of the arguments are optional but the C function takes fixed arguments:
//TCL command
get_list [options] filename
-opt1
-opt2
-opt3 arg1
-opt4 arg2
filename
//C function
static signed get_list(bool opt1,
bool opt2,
char * arg1,
objectType * arg2,
char * fileName)
Currently I have something like this:
static pList * get_list(char * arg1=NULL,
char * arg2=NULL,
char * arg3=NULL,
tObject * arg4=NULL)
This has many problems such as enforcing the object pointer is always the last argument. The SWIG documentation talks at length about C functions with variable arguments using "..." but I don't think this is what I need. I'd like the C function arguments to be fixed.
The easiest method is to wrap a Tcl procedure around the outside, like this:
rename get_list original.get_list
proc get_list args {
if {[llength $args] == 0 || [llength $args] % 2 == 0} {
error "wrong # args: ..."; # Do a proper error message here!
}
# Handle the required argument
set filename [lindex $args end]
# Initialize the defaults
array set opts {
-opt1 false
-opt2 false
-opt3 ""
-opt4 ""
}
# Merge in the supplied options
foreach {opt val} [lrange $args 0 end-1] {
if {![info exist opts($opt)]} {
error "unrecognized option \"$opt\""
}
set opts($opt) $value
}
# Hand off to C level...
original.get_list $opts(-opt1) $opts(-opt2) $opts(-opt3) $opts(-opt4) $filename
}
If you've got Tcl 8.6, that last handoff is best done with tailcall so the rewriting code is cut out of the Tcl stack. It's not vital though, as SWIGged code rarely resolves names of Tcl commands and variables.

Resources