I'm working on a small academic research about extremely long and complicated functions in the Linux kernel. I'm trying to figure out if there is a good reason to write 600 or 800 lines-long functions.
For that purpose, I would like to find a tool that can extract a function from a .c file, so I can run some automated tests on the function.
For example, If I have the function cifs_parse_mount_options() within the file connect.c, I'm seeking a solution that would roughly work like:
extract /fs/cifs/connect.c cifs_parse_mount_options
and return the 523 lines of code(!) of the function, from the opening braces to the closing braces.
Of course, any way of manipulating existing software packages like gcc to do that, would be most helpful too.
Thanks,
Udi
EDIT : The answers to Regex to pull out C function prototype declarations? convinced me that matching function declaration by regex is far from trivial.
Why don't you write a small PERL/PHP/Python script or even a small C++,Java or C# program that does that?
I don't know of any already-made tools to do that but writing the code to parse out the text file and extract a function body from a C++ code file should not take more than 20 lines of code.. The only difficult part will be locating the beginning of the function and that should be a relatively simple task using RegEx. After that, all you need is to iterate through the rest of the file keeping track of opening and closing curly braces and when you reach the function body closing brace you're done.
indent -kr code -o code.out
awk -f split.awk code.out
you have to adapt a little bit split.awk wich is somewhat specific to my code and refactoring needs (for example y have so struct who are not typedefs
And I'm sure you can make a nicer script :-)
--
BEGIN { line=0; FS="";
out=ARGV[ARGC-1] ".out";
var=ARGV[ARGC-1] ".var";
ext=ARGV[ARGC-1] ".ext";
def=ARGV[ARGC-1] ".def";
inc=ARGV[ARGC-1] ".inc";
typ=ARGV[ARGC-1] ".typ";
system ( rm " " -f " " out " " var " " ext " " def " " inc " " typ );
}
/^[ ]*\/\/.*/ { print "comment :" $0 "\n"; print $0 >> out ; next ;}
/^#define.*/ { print "define :" $0 ; print $0 >>def ; next;}
/^#include.*/ { print "define :" $0 ; print $0 >>inc ; next;}
/^typedef.*{$/ { print "typedef var :" $0 "\n"; decl="typedef";print $0 >> typ;infile="typ";next;}
/^extern.*$/ { print "extern :" $0 "\n"; print $0 >> ext;infile="ext";next;}
/^[^ }].*{$/ { print "init var :" $0 "\n";decl="var";print $0 >> var; infile="vars";
print $0;
fout=gensub("^([^ \\*])*[ ]*([a-zA-A0-9_]*)\\[.*","\\2","g") ".vars";
print "var decl : " $0 "in file " fout;
print $0 >fout;
next;
}
/^[^ }].*)$/ { print "func :" $0 "\n";decl="func"; infile="func";
print $0;
fout=gensub("^.*[ \\*]([a-zA-A0-9_]*)[ ]*\\(.*","\\1","g") ".func";
print "function : " $0 "in file " fout;
print $0 >fout;
next;
}
/^}[ ]*$/ { print "end of " decl ":" $0 "\n";
if(infile=="typ") {
print $0 >> typ;
}else if (infile=="ext"){
print $0 >> ext;
}else if (infile=="var") {
print $0 >> var;
}else if ((infile=="func")||(infile=="vars")) {
print $0 >> fout;
fflush (fout);
close (fout);
}else if (infile=="def") {
print $0 >> def;
}else if (infile=="inc"){
print $0 >> inc;
}else print $0 >> out;
next;
}
/^[a-zA-Z_]/ { print "extern :" $0 "\n"; print $0 >> var;infile="var";next;}
{ print "other :" $0 "\n" ;
if(infile=="typ") {
print $0 >> typ;
}else if (infile=="ext"){
print $0 >> ext;
}else if (infile=="var") {
print $0 >> var;
}else if ((infile=="func")||(infile=="vars")){
print $0 >> fout;
}else if (infile=="def") {
print $0 >> def;
}else if (infile=="inc"){
print $0 >> inc;
}else print $0 >> out;
next;
}
in case you are finding difficult to extract function names :
1> use ctags ( a program ) to extract function names .
ctags -x --c-kinds=fp path_to_file.
2> once u got the function names, write a simple perl script to extract contents of function by passing the script name of function as said above.
Bash builtin declare appears to provide similar functionality, but I am not sure how it is implemented. In particular, declare -f lists the functions in the present environment:
declare -f quote
declare -f quote_readline
declare outputs the list of functions in the present environment:
quote ()
{
local quoted=${1//\'/\'\\\'\'};
printf "'%s'" "$quoted"
}
quote_readline ()
{
local ret;
_quote_readline_by_ref "$1" ret;
printf %s "$ret"
}
Finally, declare -f quote outputs the function definition for the quote function.
quote ()
{
local quoted=${1//\'/\'\\\'\'};
printf "'%s'" "$quoted"
}
Perhaps the underlying machinery can be repurposed to meet your needs.
You should use something like clang which will actually parse your source code and allows you to analyse it. So it can find functions in many languages, and even if you consider macros. You have no chance using regular expressions.
Related
I am having an error saying that prototype not terminated at filename.txt line number 113 where as line number 113 belongs to a different program which is running successfully.
sub howmany(
my #H = #_;
my $m = 0;
foreach $x (#H) {
if ( $x > 5 ) {
$m +=1;
}
else {
$m +=0;
}
}
print "Number of elements greater than 5 is equal to: $m \n";
}
howmany(1,6,9);
The sub keyword should be followed by { } not ( ) (if you define a simple function), that's why the error
prototype not terminated
After this, always start with : use strict; use warnings;
Put this and debug your script, there's more errors.
Last but not least, indent your code properly, using an editor with syntax highlighting, you will save many time debugging
The error is due to parenthesis.
Never do $m += 0; As you actually load processor for nothing. Of course it's not gonna be visible on such a small function, but...
sub howmany {
my $m = 0;
foreach (#_) {
$m++ if ($_ > 5);
}
print "Number of elements greater than 5 is equal to: $m \n";
}
howmany(1,6,9);
#define SHELLSCRIPT "\
#/bin/bash \n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\nAll Users and UIDs Printed!\" }' /etc/passwd \n\
"
void displayusers()
{
system(SHELLSCRIPT);
}
The error message is:
awk: line 1: runaway string constant "User...
The bash cmd when run and works in the terminal is:
awk 'BEGIN { FS=":"; print "User\t\tUID\n--------------------"; } { print $1,"\t\t",$3;} END { print "--------------------\nAll Users and UIDs Printed!" }' /etc/passwd
I think somewhere when using \ to block out the various " for c it messed up my awk. But I'm not sure where. Ideas?
I simply took your string and used it in a printf() statement, and then analyzed the output:
#include <stdio.h>
#define SHELLSCRIPT "\
#/bin/bash \n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\nAll Users and UIDs Printed!\" }' /etc/passwd \n\
"
int main(void)
{
printf("[[%s]]\n", SHELLSCRIPT);
return 0;
}
Example run:
$ ./runaway
[[#/bin/bash
awk 'BEGIN { FS=":"; print "User UID
--------------------"; } { print $1," ",$3;} END { print "--------------------
All Users and UIDs Printed!" }' /etc/passwd
]]
$
When I made the line ends visible (^J marks the end of line, ^I tabs), the problem is transparent:
[[#/bin/bash ^J
awk 'BEGIN { FS=":"; print "User^I^IUID^J
--------------------"; } { print $1,"^I^I",$3;} END { print "--------------------^J
All Users and UIDs Printed!" }' /etc/passwd ^J
]]^J
You have two occurrences of \n in the string which need to be \\n. It is up to you whether you change the appearances of \t to \\t; it works either way.
#define SHELLSCRIPT "\
#/bin/bash\n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\\nAll Users and UIDs Printed!\" }' /etc/passwd\n"
Using that in my program yields:
[[#/bin/bash^J
awk 'BEGIN { FS=":"; print "User^I^IUID\n--------------------"; } { print $1,"^I^I",$3;} END { print "--------------------\nAll Users and UIDs Printed!" }' /etc/passwd^J
]]^J
Note, in particular, the technique used to debug this. Print the data so you can see it precisely.
Haven't tested, but here's a useful-looking article just about this topic and it would appear that your choices are something along the following lines:
if you have different quotes surrounding the text, you don't need to escape the interior ones
you can use the same quotes surrounding and escape the interior ones
you can use octal sequences for the quotes, e.g. <\42>
if it get's too confusing, move the string into a separate file where quoting will not be an issue
Warning, not an awk programmer.
I have a file, let's call it file.txt. It has a list of numbers which I will be using to find the information I need from the rest of the directory (which is full of files *.asc). The remaining files do not have the same lengths, but since I will be drawing data based on file.txt, the matrix I will be building will have the same number of rows. All files DO however contain the same number of columns, 3. The first column will be compared to file.txt, the second column of each *.asc file will be used to build the matrix. Here is what I have so far:
awk '
NR==FNR{
A[$1];
next}
$1 in A
{print $2 >> "data.txt";}' file.txt *.asc
This, however, prints the information from each file below the previous file. I want the information side by side, like a matrix. I looked up paste, but it seems to be called before awk, and all examples were only of a couple of files. I tried it still in place of print and did not work.
If anyone could help me out, this would be the last piece to my project. Thanks so much!
You could try:
awk -f ext.awk file.txt *.asc > data.txt
where ext.awk is
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (ARGIND > 2)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}
Update
If you do not have Gnu Awk, the ARGIND variable is not available. You could then try
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (++ai > 1)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}
The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016
I'd like to do a search for simple if statements in a collection of C source files.
These are statements of the form:
if (condition)
statement;
Any amount of white space or other sequences (e.g. "} else ") might appear on the same line before the if. Comments may appear between the "if (condition)" and "statement;".
I want to exclude compound statements of the form:
if (condition)
{
statement;
statement;
}
I've tried each of the following in awk:
awk '/if \(.*\)[^{]+;/ {print NR $0}' file.c # (A) No results
awk '/if \(.*\)[^{]+/ {print NR $0}' file.c # (B)
awk '/if \(.*\)/ {print NR $0}' file.c # (C)
(B) and (C) give different results. Both include items I'm looking for and items I want to exclude. Part of the problem, obviously, is how to deal with patterns that span multiple lines.
Edge cases (badly formed comments, odd indenting or curly braces in odd places, etc.) can be ignored.
How can I accomplish this?
Based on Al's answer, but with fixes for a couple of problems (plus I decided to check for simple else clauses, too (also, it prints the full if block):
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
#ifdef NEW
my $block = "";
#endif /* NEW */
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
$block = $block . $line_number . "+ " . $_;
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
$block = "";
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line;
print $block;
$block = "";
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/(if \(.*\)|[^#]else)(.*)/)
{
# Store the line contents
$if_line = $line_number . ": " . $_;
# If the end of the line has a semicolon, report it
if ($2 =~ ';')
{
print $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($2 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm not sure how you'd do this with a one liner (I'm sure you could by using sed's 'n' command to read the next line, but it would be very complicated), so you probably want to use a script for this. How about:
perl parse_if.pl file.c
Where parse_if.pl contains:
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line_number . ": " . $if_line;
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/^[^#]*\b(?:if|else|while) \(.*\)(.*)/)
{
# Store the line contents
$if_line = $_;
$if_line_number = $line_number;
# If the end of the line has a semicolon, report it
if ($1 =~ ';')
{
print $if_line_number . ": " . $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($1 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm sure you could do something fairly easily in any other language (including awk) if you wanted to; I just thought that I could do it quickest in perl by way of an example.
In awk, each line is treated as a record and "\n" is the record separator. As all the records are parsed line by line so you need to keep track of next line after if. I don't know how can you do this in awk..
In perl, you can do this easily as
open(INFO,"<file.c");
$flag=0;
while($line = <INFO>)
{
if($line =~ m/if\s*\(/ )
{
print $line;
$flag = 1;
}
else
{
print $line && $flag ;
$flag = 0 if($flag);
}
}
Using Awk you can do this by:
awk '
BEGIN { flag=0 }
{
if($0 ~ /if/) {
print $0;
flag=NR+1
}
if(flag==NR)
print $0
}' try.c