Locate unused structures and structure-members - c

Some time ago we took over the responsibility of a legacy code base.
One of the quirks of this very badly structured/written code was that
it contained a number of really huge structs, each containing
hundreds of members. One of the many steps that we did was to clean
out as much of the code as possible that wasn't used, hence the need
to find unused structs/struct members.
Regarding the structs, I conjured up a combination of python, GNU
Global and ctags to list the struct members that are unused.
Basically, what I'm doing is to use ctags to generate a tags file,
the python-script below parses that file to locate all struct
members and then using GNU Global to do a lookup in the previously
generated global-database to see if that member is used in the code.
This approach have a number of quite serious flaws, but it sort of
solved the issue we faced and gave us a good start for further
cleanup.
There must be a better way to do this!
The question is: How to find unused structures and structure members
in a code base?
#!/usr/bin/env python
import os
import string
import sys
import operator
def printheader(word):
"""generate a nice header string"""
print "\n%s\n%s" % (word, "-" * len(word))
class StructFreqAnalysis:
""" add description"""
def __init__(self):
self.path2hfile=''
self.name=''
self.id=''
self.members=[]
def show(self):
print 'path2hfile:',self.path2hfile
print 'name:',self.name
print 'members:',self.members
print
def sort(self):
return sorted(self.members, key=operator.itemgetter(1))
def prettyprint(self):
'''display a sorted list'''
print 'struct:',self.name
print 'path:',self.path2hfile
for i in self.sort():
print ' ',i[0],':',i[1]
print
f=open('tags','r')
x={} # struct_name -> class
y={} # internal tags id -> class
for i in f:
i=i.strip()
if 'typeref:struct:' in i:
line=i.split()
x[line[0]]=StructFreqAnalysis()
x[line[0]].name=line[0]
x[line[0]].path2hfile=line[1]
for j in line:
if 'typeref' in j:
s=j.split(':')
x[line[0]].id=s[-1]
y[s[-1]]=x[line[0]]
f.seek(0)
for i in f:
i=i.strip()
if 'struct:' in i:
items=i.split()
name=items[0]
id=items[-1].split(':')[-1]
if id:
if id in y:
key=y[id]
key.members.append([name,0])
f.close()
# do frequency count
for k,v in x.iteritems():
for i in v.members:
cmd='global -a -s %s'%i[0] # -a absolute path. use global to give src-file for member
g=os.popen(cmd)
for gout in g:
if '.c' in gout:
gout=gout.strip()
f=open(gout,'r')
for line in f:
if '->'+i[0] in line or '.'+i[0] in line:
i[1]=i[1]+1
f.close()
printheader('All structures')
for k,v in x.iteritems():
v.prettyprint()
#show which structs that can be removed
printheader('These structs could perhaps be removed')
for k,v in x.iteritems():
if len(v.members)==0:
v.show()
printheader('Total number of probably unused members')
cnt=0
for k,v in x.iteritems():
for i in v.members:
if i[1]==0:
cnt=cnt+1
print cnt
Edit
As proposed by #Jens-Gustedt using the compiler is a good way to do it. I'm after a approach that can do a sort of "High Level" filtering before using the compiler-approach.

If these are only a few struct and if the code does no bad hacks of accessing a struct through another type... then you could just comment out all the fields of your first struct and let the compiler tell you.
Uncomment one used field after the other until the compiler is satisfied. Then once that compiles, to a good testing to ensure the precondition that there were no hacks.
Iterate over all struct.
Definitively not pretty, but at the end you'd have at least one person who knows the code a bit.

Use coverity. This is a wonderful tool to detect code flaws, but is a bit costly.

Although it is a very old post. But recently I did the same using python and gdb. I compiled following snippet of code with structure at the top of hierarchy and then using gdb did print type on the structure and re-cursed into its members.
#include <usedheader.h>
UsedStructureInTop *to_print = 0;
int main(){return 0;}
(gdb) p to_print
(gdb) $1 = (UsedStructureInTop *) 0x0
(gdb) pt UsedStructureInTop
type = struct StructureTag {
members displayed here line by line
}
(gdb)
Although my purpose is little different. It is to generate a header that contains only the structure UsedStructureInTop and its dependency types. There are compiler options to do this. But they do not remove unused/unlinked structures found in the included header files.

Under C rules, it's possible to access struct members via another structure which has a similar layout. That means that you can access struct Foo {int a; float b; char c; }; via struct Bar { int x; float y; }; (except of course for Foo::c).
Hence, your algorithm is potentially flawed. It's bloody hard to find what you want, which BTW is why C is hard to optimize.

Related

How to 'tag' a location in a C source file for a later breakpoint definition?

Problem:
I want to be able to put different potentially unique or repeated "tags" across my C code, such that I can use them in gdb to create breakpoints.
Similar Work:
Breakpoints to line-numbers: The main difference with breakpoints on source lines, is that if the code previous to the tag is modified in such a way that it results in more or less lines, a reference to the tag would still be semantically correct, a reference to the source line would not.
Labels: I am coming from my previous question, How to tell gcc to keep my unused labels?, in which I preconceived the idea that the answer was to insert labels. Upon discussion with knowledgeable members of the platform, I was taught that label's names are not preserved after compilation. Labels not used within C are removed by the compiler.
Injecting asm labels: Related to the previous approach, if I inject asm code in the C source, certain problems arise, due to inline functions, compiler optimizations, and lack of scoping. This makes this approach not robust.
Define a dummy function: On this other question, Set GDB breakpoint in C file, there is an interesting approach, in which a "dummy" function can be placed in the code, and then add a breakpoint to the function call. The problem with this approach is that the definition of such function must be replicated for each different tag.
Is there a better solution to accomplish this? Or a different angle to attack the presented problem?
Using SDT (Statically Defined Tracing) probe points appears to satisfy all the requirements.
GDB documentation links to examples of how to define the probes.
Example use: (gdb) break -probe-stap my_probe (this should be documented in the GDB breakpoints section, but currently isn't).
You could create a dummy variable and set it to different values. Then you can use conditional watchpoints. Example:
#include <stdio.h>
static volatile int loc;
int main()
{
loc = 1;
puts("hello world");
loc = 2;
return 0;
}
(gdb) watch loc if loc == 2
Hardware watchpoint 1: loc
(gdb) r
Starting program: /tmp/a.out
hello world
Hardware watchpoint 1: loc
Old value = 1
New value = 2
main () at test.c:8
8 return 0;
You can of course wrap the assignment in a macro so you only get it in debug builds. Usual caveats apply: optimizations and inlining may be affected.
Use python to search a source file for some predefined labels, and set breakpoints there:
def break_on_labels(source, label):
"""add breakpoint on each SOURCE line containing LABEL"""
with open(source) as file:
l = 0
for line in file:
l = l + 1
if label in line:
gdb.Breakpoint(source=source, line=l)
main_file = gdb.lookup_global_symbol("main").symtab.fullname()
break_on_labels(main_file, "BREAK-HERE")
Example:
int main(void)
{
int a = 15;
a = a + 23; // BREAK-HERE
return a;
}
You could insert a #warning at each line where you want a breakpoint, then have a script to parse the file and line numbers from the compiler messages and write a .gdbinit file placing breakpoints at those locations.

Dynamic method of calling structures in C

I have a project which involves writing a C program for some software used by my company. I want it to be as efficient as possible, but the way the software references the signals I'm working with is a little wonky. I'm working with 4 sets of 96 signals; these signals are grouped into 32 groups with 3 members each. Rather than generic functions to work with these signals (the generic functions exist, but there's no documentation on how they work), the auto-generated header file has defined a group of macros (I think) for each of these groups.
Each of the groups are defined as follows...
typedef struct {
//struct members
} AB_A_Group_A_Network
Each of those structures have a group of macros(?) defined like this...
void AB_A_Group_ZF_Network_Init(AB_A_Group_A_Network *pAbc)
{
double values[6]
...
pAbc->Member_1 = values[3] //array positions vary
pAbc->Member_2 = values[4]
pAbc->Member_3 = values[5]
}
The software's tech support suggested I do the following, but I was hoping there'd be a better way to do it. Of course this is a long method, but I can write it with a python script no problem, if need be. These 250+ lines of code will run every second in my application for each data set.
AB_A_Group_A_Network GroupA;
AB_A_Group_B_Network GroupB;
//...and so on
AB_A_Group_ZF_Network GroupZF;
AB_A_Group_A_Network_Init(&GroupA);
AB_A_Group_B_Network_Init(&GroupB);
//...and so on
AB_A_Group_ZF_Network_Init(&GroupZF);
CD_Array_Set(0,GroupA.Member_1); //a custom array function meant to interface with the software
CD_Array_Set(1,GroupA.Member_2);
//...and so on
CD_Array_Set(95,GroupZF.Member_96);
//...Repeat 3 times for 4 sets of data (Data sets A-D), with checks to see if that data exists
I thought of doing something like this, but I'm not sure you can use char arrays in this way. I'll add the disclaimer don't have much experience with C, so this might look/sound really stupid. This isn't working code, just a stream of consciousness. I'm also not sure if doing it in this way, if possible, would just end up losing efficiency. I'm completely open to other methods.
char groupName[70] = "AB_*_Group_*_Network";
char dataset = 'A';
char group = 'A';
char member = '1';
char groupInit[70]
//write a loop to increment dataset alphabetically
//check if dataset exists
groupName[3] = dataset;
//write a sub-loop to increment group alphabetically (each set has 32 groups)
groupName[27] = group;
//use groupName in place of struct name (not sure how or if this is possible with my current methods)
groupInit = strncat(groupName, "_Init", 5);
//use groupInit in place of _init macro name
//write a sub-loop to increment member numerically (each group has 3 members)
char member[10] = "Member_*";
member[7] = member;
CD_Array_Set(i,groupName.member);
Is the long method really the best way of doing this? Any advice you can offer is appreciated!
There's no way to refer to variables and type names dynamically from strings at runtime. But you can use token pasting in a macro to avoid all the repeated code.
#define INIT_GROUP(dataset, group) \
AB_ ## dataset ## _Group_ ## group ## _Network Group ## group; \
AB_ ## dataset ## _Group_ ## group ## _Network_Init(&Group ## group); \
CD_Array_Set(0,Group ## group .Member_1); \
CD_Array_Set(1,Group ## group .Member_2); \
...
CD_Array_Set(95,Group ## group .Member_96);
Unfortunately, there are no loops in the preprocessor, so you have to write all 96 CD_Array_Set lines in the macro.
With this macro, you can then write:
INIT_GROUP(A, A)
INIT_GROUP(A, B)
...
INIT_GROUP(A, ZF)
Auto generated C code
It is possible to do such things in C but it requires ugly Macros.
First thing i would look at is
An old Stackoverflow question: https://stackoverflow.com/a/12591965/8964221
And the mentioned gist in it: https://gist.github.com/epatel/3786323
BUT!
I would think about the time it would consume to have it bug free and working.
It won`t be that easy to transform it to ur needs.

Create shared parameter file for C and Python

I need to create a parameter file that can be managed across a Python 3.7 and a C code base. This file needs to be modifiable either by the C or the Python program with the changes being taking effect on the other software (an update function will handle reading the updated file). It's best if the file is not human readable, as it contains information that is better left obfuscated.
**Is there a recommended method to do so? **
I could create separate python and C files, but the set of parameters will change over time (for code maintenance), and the values would be changed by these programs. The list would also be very long. It would be a hassle to maintain two different files and update them over time. Also, the file may need to be exchanged between users, such that a version modified by the software ran by user1 needs to be readable by the software run by user2. The idea is that other parts of both codes could access parts of the parameter list without knowing the full contents of the list.
To clarify the example, I could have a parameter.h file containing:
struct {
double par1 =1.1;
int par 2 =2;
} par_list
And I could have a parameter.py with:
class par_list:
def(__self__):
self.par1 = double(1.1)
self.par2 = int(2)
Then, by doing a import in Python or a include in C, I could initialize the parameter list. But in this case the parameters are being read on different files.
I'm considering using some kind of binary file to keep the values, and create a script that writes both the Python and C code that reads and updates the values. I'm concerned because the binary file would need to be interchangeable between ARM architecture running Linux, and x86 architecture running Windows.
Here is an example working with numpy:
C code:
#include <stdio.h>
#include <stdint.h>
struct Struct_format{
uint8_t the_unsigned_int8;
int32_t the_signed_int32[2];
double the_double;
};
typedef struct Struct_format upperStruct;
//Use separate file to define default value:
void printStruct(upperStruct test_struct){
printf("test_struct.the_unsigned_int8 = %d\n", test_struct.the_unsigned_int8);
printf("test_struct.the_signed_int32[0] = %d\n", test_struct.the_signed_int32[0]);
printf("test_struct.the_signed_int32[1] = %d\n", test_struct.the_signed_int32[1]);
printf("test_struct.the_double = %f\n", test_struct.the_double);
}
void main(){
//Define a "default" value:
upperStruct fromC2Python = {4U,{-3,-1},2.1};
printf("Printing fromC2Python\n");
printStruct(fromC2Python);
//Save this default in a file:
FILE * fid = fopen("fromC2Python.bin","w");
fwrite((void *)&fromC2Python, sizeof(fromC2Python) ,1, fid);
fclose(fid);
//Now load the file created by Python:
upperStruct fromPython2C;
FILE * fid_py = fopen("fromPython2C.bin","r");
fread(&fromPython2C, sizeof(fromPython2C) ,1, fid_py);
fclose(fid_py);
printf("Printing fromPython2C\n");
printStruct(fromPython2C);
}
Python code:
import numpy
datatype = numpy.dtype([('potato',
[('time', numpy.uint8),
('sec', numpy.int32, 2)]),
('temp', numpy.float64)],
align=True)
fromPython2C = numpy.array([((5, (-6, -7)), 61.55)], dtype=datatype)
print(fromPython2C)
fromPython2C.tofile("fromPython2C.bin", sep="")
fromC2Python = numpy.fromfile("fromC2Python.bin", dtype=datatype, count=-1, sep="")
print(fromC2Python)
print(fromC2Python['potato'])
print(fromC2Python['potato']['time'])
print(fromC2Python['temp'])
The ideia is that numpy allows reading and writing to structured binary files. Hence, it suffices to create the dtype specification with a text parser.

Using R random number generators in C [duplicate]

I would like to, within my own compiled C++ code, check to see if a library package is loaded in R (if not, load it), call a function from that library and get the results back to in my C++ code.
Could someone point me in the right direction? There seems to be a plethora of info on R and different ways of calling R from C++ and vis versa, but I have not come across exactly what I am wanting to do.
Thanks.
Dirk's probably right that RInside makes life easier. But for the die-hards... The essence comes from Writing R Extensions sections 8.1 and 8.2, and from the examples distributed with R. The material below covers constructing and evaluating the call; dealing with the return value is a different (and in some sense easier) topic.
Setup
Let's suppose a Linux / Mac platform. The first thing is that R must have been compiled to allow linking, either to a shared or static R library. I work with an svn copy of R's source, in the directory ~/src/R-devel. I switch to some other directory, call it ~/bin/R-devel, and then
~/src/R-devel/configure --enable-R-shlib
make -j
this generates ~/bin/R-devel/lib/libR.so; perhaps whatever distribution you're using already has this? The -j flag runs make in parallel, which greatly speeds the build.
Examples for embedding are in ~/src/R-devel/tests/Embedding, and they can be made with cd ~/bin/R-devel/tests/Embedding && make. Obviously, the source code for these examples is extremely instructive.
Code
To illustrate, create a file embed.cpp. Start by including the header that defines R data structures, and the R embedding interface; these are located in bin/R-devel/include, and serve as the primary documentation. We also have a prototype for the function that will do all the work
#include <Rembedded.h>
#include <Rdefines.h>
static void doSplinesExample();
The work flow is to start R, do the work, and end R:
int
main(int argc, char *argv[])
{
Rf_initEmbeddedR(argc, argv);
doSplinesExample();
Rf_endEmbeddedR(0);
return 0;
}
The examples under Embedding include one that calls library(splines), sets a named option, then runs a function example("ns"). Here's the routine that does this
static void
doSplinesExample()
{
SEXP e, result;
int errorOccurred;
// create and evaluate 'library(splines)'
PROTECT(e = lang2(install("library"), mkString("splines")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
if (errorOccurred) {
// handle error
}
UNPROTECT(1);
// 'options(FALSE)' ...
PROTECT(e = lang2(install("options"), ScalarLogical(0)));
// ... modified to 'options(example.ask=FALSE)' (this is obscure)
SET_TAG(CDR(e), install("example.ask"));
R_tryEval(e, R_GlobalEnv, NULL);
UNPROTECT(1);
// 'example("ns")'
PROTECT(e = lang2(install("example"), mkString("ns")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
UNPROTECT(1);
}
Compile and run
We're now ready to put everything together. The compiler needs to know where the headers and libraries are
g++ -I/home/user/bin/R-devel/include -L/home/user/bin/R-devel/lib -lR embed.cpp
The compiled application needs to be run in the correct environment, e.g., with R_HOME set correctly; this can be arranged easily (obviously a deployed app would want to take a more extensive approach) with
R CMD ./a.out
Depending on your ambitions, some parts of section 8 of Writing R Extensions are not relevant, e.g., callbacks are needed to implement a GUI on top of R, but not to evaluate simple code chunks.
Some detail
Running through that in a bit of detail... An SEXP (S-expression) is a data structure fundamental to R's representation of basic types (integer, logical, language calls, etc.). The line
PROTECT(e = lang2(install("library"), mkString("splines")));
makes a symbol library and a string "splines", and places them into a language construct consisting of two elements. This constructs an unevaluated language object, approximately equivalent to quote(library("splines")) in R. lang2 returns an SEXP that has been allocated from R's memory pool, and it needs to be PROTECTed from garbage collection. PROTECT adds the address pointed to by e to a protection stack, when the memory no longer needs to be protected, the address is popped from the stack (with UNPROTECT(1), a few lines down). The line
R_tryEval(e, R_GlobalEnv, &errorOccurred);
tries to evaluate e in R's global environment. errorOccurred is set to non-0 if an error occurs. R_tryEval returns an SEXP representing the result of the function, but we ignore it here. Because we no longer need the memory allocated to store library("splines"), we tell R that it is no longer PROTECT'ed.
The next chunk of code is similar, evaluating options(example.ask=FALSE), but the construction of the call is more complicated. The S-expression created by lang2 is a pair list, conceptually with a node, a left pointer (CAR) and a right pointer (CDR). The left pointer of e points to the symbol options. The right pointer of e points to another node in the pair list, whose left pointer is FALSE (the right pointer is R_NilValue, indicating the end of the language expression). Each node of a pair list can have a TAG, the meaning of which depends on the role played by the node. Here we attach an argument name.
SET_TAG(CDR(e), install("example.ask"));
The next line evaluates the expression that we have constructed (options(example.ask=FALSE)), using NULL to indicate that we'll ignore the success or failure of the function's evaluation. A different way of constructing and evaluating this call is illustrated in R-devel/tests/Embedding/RParseEval.c, adapted here as
PROTECT(tmp = mkString("options(example.ask=FALSE)"));
PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
R_tryEval(VECTOR_ELT(e, 0), R_GlobalEnv, NULL);
UNPROTECT(2);
but this doesn't seem like a good strategy in general, as it mixes R and C code and does not allow computed arguments to be used in R functions. Instead write and manage R code in R (e.g., creating a package with functions that perform complicated series of R manipulations) that your C code uses.
The final block of code above constructs and evaluates example("ns"). Rf_tryEval returns the result of the function call, so
SEXP result;
PROTECT(result = Rf_tryEval(e, R_GlobalEnv, &errorOccurred));
// ...
UNPROTECT(1);
would capture that for subsequent processing.
There is Rcpp which allows you to easily extend R with C++ code, and also have that C++ code call back to R. There are examples included in the package which show that.
But maybe what you really want is to keep your C++ program (i.e. you own main()) and call out to R? That can be done most easily with
RInside which allows you to very easily embed R inside your C++ application---and the test for library, load if needed and function call are then extremely easy to do, and the (more than a dozen) included examples show you how to. And Rcpp still helps you to get results back and forth.
Edit: As Martin was kind enough to show things the official way I cannot help and contrast it with one of the examples shipping with RInside. It is something I once wrote quickly to help someone who had asked on r-help about how to load (a portfolio optimisation) library and use it. It meets your requirements: load a library, accesses some data in pass a weights vector down from C++ to R, deploy R and get the result back.
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8; -*-
//
// Simple example for the repeated r-devel mails by Abhijit Bera
//
// Copyright (C) 2009 Dirk Eddelbuettel
// Copyright (C) 2010 - 2011 Dirk Eddelbuettel and Romain Francois
#include <RInside.h> // for the embedded R via RInside
int main(int argc, char *argv[]) {
try {
RInside R(argc, argv); // create an embedded R instance
std::string txt = "suppressMessages(library(fPortfolio))";
R.parseEvalQ(txt); // load library, no return value
txt = "M <- as.matrix(SWX.RET); print(head(M)); M";
// assign mat. M to NumericMatrix
Rcpp::NumericMatrix M = R.parseEval(txt);
std::cout << "M has "
<< M.nrow() << " rows and "
<< M.ncol() << " cols" << std::endl;
txt = "colnames(M)"; // assign columns names of M to ans and
// into string vector cnames
Rcpp::CharacterVector cnames = R.parseEval(txt);
for (int i=0; i<M.ncol(); i++) {
std::cout << "Column " << cnames[i]
<< " in row 42 has " << M(42,i) << std::endl;
}
} catch(std::exception& ex) {
std::cerr << "Exception caught: " << ex.what() << std::endl;
} catch(...) {
std::cerr << "Unknown exception caught" << std::endl;
}
exit(0);
}
This rinside_sample2.cpp, and there are lots more examples in the package. To build it, you just say 'make rinside_sample2' as the supplied Makefile is set up to find R, Rcpp and RInside.

Updated: When to "mortalize" a variable in Perl Inline::C

I am trying to wrap a C library into Perl. I have tinkered with XS but being unsuccessful I thought I should start simply with Inline::C. My question is on Mortalization. I have been reading perlguts as best as I am able, but am still confused. Do I need to call sv_2mortal on an SV* that is to be returned if I am not pushing it onto the stack?
(PS I really am working on a less than functional knowledge of C which is hurting me. I have a friend who knows C helping me, but he doesn't know any Perl).
I am providing a sample below. The function FLIGetLibVersion simply puts len characters of the library version onto char* ver. My question is will the version_return form of my C code leak memory?
N.B. any other comments on this code is welcomed.
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
use Inline (
C => 'DATA',
LIBS => '-lm -lfli',
FORCE_BUILD => 1,
);
say version_stack();
say version_return();
__DATA__
__C__
#include <stdio.h>
#include "libfli.h"
void version_stack() {
Inline_Stack_Vars;
Inline_Stack_Reset;
size_t len = 50;
char ver[len];
FLIGetLibVersion(ver, len);
Inline_Stack_Push(sv_2mortal(newSVpv(ver,strlen(ver))));
Inline_Stack_Done;
}
SV* version_return() {
size_t len = 50;
char ver[len];
FLIGetLibVersion(ver, len);
SV* ret = newSVpv(ver, strlen(ver));
return ret;
}
Edit:
In an attempt to answer this myself, I tried changing the line to
SV* ret = sv_2mortal(newSVpv(ver, strlen(ver)));
and now when I run the script I get the same output that I did previously plus an extra warning. Here is the output:
Software Development Library for Linux 1.99
Software Development Library for Linux 1.99
Attempt to free unreferenced scalar: SV 0x2308aa8, Perl interpreter: 0x22cb010.
I imagine that this means that I don't need to mortalize in this case? I suspect that the error is saying that I marked for collection something that was already in line for collection. Can someone confirm for me that that is what that warning means?
I've been maintaining Set::Object for many years and had this question, too - perhaps best to look at the source of that code to see when stuff should be mortalised (github.com/samv/Set-Object). I know Set::Object has it right after many changes. I think though, it's whenever you're pushing the SV onto the return stack. Not sure how Inline changes all that.

Resources