How to generate program dependence graph for C program? - c

I want to generate a Program Dependence Graph (PDG) from C source code. I found papers that explain how do it, but all used the commercial CodeSurfer tool.
Are there any free tools that do this?

Frama-C is an open-source static analysis framework that can compute a sound Program Dependency Graph for C programs. Its slicing plug-in uses the resulting PDG. The slicing and PDG computation were discussed in February 2010 on the mailing list (messages from jung, myung-jin and their answers).
You may also look at NIST's Unravel, or Georgia Tech's Aristotle. Both Valsoft at Karlsruhe University, and Loyola's Surgeon's Assistant, might also be worth looking into.

There's a promising new tool called cpp-depenencies.
It can generate component dependency diagrams (like below) as well as class hierarchy diagrams (by passing an option to treat each source file as a component).

Doxygen can generate function caller and callee graphs, as well as all the functions used in your program. This may not be exactly what you are looking for, but it could provide some useful data.
SourceMonitor is a metrics tool that can show function and program complexity as well as complexity diagrams.
Both tools are free.

If you use llvm to generate pdg, you can use this:
https://bitbucket.org/psu_soslab/program-dependence-graph-in-llvm/src/master/

Related

How do I generate a data flow graph with clang or other tools?

With clang and graphviz I can generate the calling graph for some C/C++ code as explained in this answer.
Now I need a data flow diagram computed on a really large codebase ( it's C for the most part ), this codebase is a software where cmake is used as building tool.
So my problem is, given the name of a data structure, how I can possibly retrieve the names of the functions and the files using/implementing this structure ?
There is some sparse reference to some data flow mining algorithms inside Libtool from the clang project ( not even sure if it's something stable or in development ), but I found nothing on clang itself or scan-build.
How I can generate this piece of information ? I really need just that, given a name I would like to retrieve where is used in the code, pretty much all the static analysis tools that I have reviewed are focusing on functions and methods, I need to check a data structure usage in clang.
EDIT:
I'm also considering using doxygen for the documentation, so if the xml output of doxygen could be useful for some tool, I can use it.
You can query
all references to a symbol
global definitions
functions called by a function
functions calling a function
files including a file
and more.
with cscope.

How to get abstract syntax tree of a `c` program in `GCC`

How can I get the abstract syntax tree of a c program in gcc?
I'm trying to automatically insert OpenMP pragmas to the input c program.
I need to analyze nested for loops for finding dependencies so that I can insert appropriate OpenMP pragmas.
So basically what I want to do is traverse and analyze the abstract syntax tree of the input c program.
How do I achieve this?
You need full dataflow to find 'dependencies'. Then you will need to actually insert the OpenMP calls.
What you want is a program transformation system. GCC probably has the dependency information, but it is famously difficult to work with for custom projects. Others have mentioned Clang and Rose. Clang might be a decent choice, but custom analysis/transformation isn't its main purpose. Rose is designed to support custom tools, but IMHO is a rather complicated scheme to use in practice because of its use of the EDG front end, which isn't designed to support transformation.
[THE FOLLOWING TEXT WAS DELETED BY A MODERATOR. I HAVE PUT IT BACK, BECAUSE IT IS ONE THE VALID TRANSFORMATION SYSTEMS FOR THIS TASK. THE FACT THAT I AM RESPONSIBLE FOR IT IN NO WAY DIMINISHES ITS VALUE AS A USEFUL ANSWER TO THE OP.]
Our DMS Software Reengineering Toolkit with its C front end is explicitly designed to be a program transformation system. It has full data flow analysis (including points-to analysis, call graph construction and range analyses) tied to the AST in sensible ways. It provides source-to-source rewrite rules enabling changes to the ASTs expressed in surface syntax form; you can read the transformations rather than inspect a bunch of procedural code. With a modified AST, DMS can regenerate source code including the comments in a compilable form.
Not exactly an AST but GCCXML might help http://linux.die.net/man/1/gccxml
edit : as stated by Ira Baxter gccxml does not output information about function/methods bodies. Here's a fork that seems to fix that lack http://sourceforge.net/projects/gccxml-bodies/

What tools/IDE/languages exists for generating C-code

I wanted to know more about tools that assist in writing code-generators for C. Essentially how do we achieve functionality similar to c++ templates.
Even though it's not a perfect solution and it takes some time to master it, I've used the m4 macro processor in the past for generic C code generation (kinda like C++ templates). You may want to check that out.
You are looking for a way to generate code for very similar classes, where what differs is essentially their type.
You can use a template-based code generator, where "template" means "boilerplate code" with string substitution. This is the simplest scenario. A tool like StringTemplate or CodeSmith will do the job. But there are many others. Just search around.
If you want a more serious generation scenario, where different class structures might be needed according to a set of definitions, then you should go with a fully programmable generator like AtomWeaver. There are others (MPS, Xtext) but these do not rely on templates.

Spaghetti code visualisation software?

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T
My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.
If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.
It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]
If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

C to IEC 61131-3 IL compiler

I have a requirement for porting some existing C code to a IEC 61131-3 compliant PLC.
I have some options of splitting the code into discrete function blocks and weaving those blocks into a standard solution (Ladder, FB, Structured Text etc). But this would require carving up the C code in order to build each function block.
When looking at the IEC spec I realsied that the IEC Instruction List form could be a target language for a compiler. The wikepedia article lists two development tools:
CoDeSys
Beremiz
But these seem to be targeted compiling IEC languages to C, not C to IEC.
Another possible solution is to push the C code through a C to Pascal translator and use that as a starting point for a Structured Text solution.
If not any of these I will go down the route of splitting the code up into function blocks.
Edit
As prompted by mlieson's reply I should have mentioned that the C code is an existing real-time control system. So the programs algorithms should already suit a PLC environment.
Maybe this answer comes too late but it is possible to call C code from CoDeSys thanks to an external library.
You can find documentation on the CoDeSys forum at http://forum-en.3s-software.com/viewtopic.php?t=620
That would give you to use your C code into the PLC with minor modifcations. You'll just have to define the functions or function blocks interfaces.
My guess is that a C to Pascal translator will not get you near enough for being worth the trouble. Structured text looks a lot like Pascal, but there are differences that you will need to fix everywhere.
Not a bug issue, but don't forget that PLCs runtime enviroment is a bit different. A C applications starts at main() and ends when main() returns. A PLC calls it main() over and over again, 100:s of times per second and it never ends.
Usally lengthy calculations and I/O needs to be coded in diffent fashion than a C appliation would use.
Unless your C source is many many thousands lines of code - Rewrite it.
It is impossible. To be short: the IL language is a 4GL (i.e. limited to
the domain, as well as other IEC 61131-3 languages -- ST, FBD, LD, SFC).
The C language is a 3GL.
To understand the problem, try to answer the question, which way to
express in IL manipulations with a pointer? for example, to express call a
function by a pointer. What about interrupts? Low level access to the
peripherial devices?
(really, there are more problems)
BTW, there is the Reflex language, aka "C with processes". Reflex is a 4GL for the
control domain with C-like syntax. But the known translators produce
C-code and Python-code.
If the amount of code to convert is a few thousand lines, recoding by hand is probably your best bet.
If you have lots of code to convert, then an automated tool might be very effective.
Using the DMS Software Reengineering Toolkit we've built translators to map mechanical motion diagrams into RLL (PLC) code. DMS also has full C parser/analyzers/front ends. The pieces are there to build a C to RLL code.
This isn't an easy task. It likely takes 6-12 man-months to configure DMS to something resembling what you want. If that's less than what it takes to do by hand, then its the right way to do it.
There are a few IEC development environments and target hardware that can use C blocks... I would also take a look at the reasons why it HAS to be an IEC-61131 complaint target. I have written extensively on compliance and why it doesn't mean squat.
SOFTplc corp can help I'm sure with user defined loadable modules... and they can be in C..
Schneider also supports C function blocks...
Labview too!! not sure why IEC is important that's all!! the compiler if existed would create bad code for sure:)
Your best bet is to split your C code into smaller parts which can be recoded as PLC functional blocks and use C to PASCAL convertor for each block which you will rewrite in structured text. Prepare to do a lot of manual work since automated conversion will probably disappoint you.
Also take a look at this page: http://www.control.com/thread/1026228786
Every time I've done this, I just parsed and converted it by hand from C directly to ST. I only ran into a few functions that required complete rewrites, although there was very little that dealt with pointers, which is something that ST generally chokes on, unfortunately.
Using the existing C code as blocks that are called by the PLC program would have the added advantage that the C blocks could run at the same periodicity that they did before, and their function is likely already well documented and tested. This would minimize any effect on changes from the existing control system. This is an architecture for controls with software PLCs that I have seen used before.

Resources