Spaghetti code visualisation software? - c

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T

My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.

If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.

It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]

If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

Related

How to methodically trace the location of source code

I often spend lots of time trying to find out where the exact implementation is located. It gets very frustrating when dealing with some low-level code that might end up somewhere in kernel.
I usually just google or try to guess the location and/or method names, but it is not always very effective.
Is there some methodical way to trace the flow up to the implementation? How do you guys usually do it?
Load the whole the code with relevant dependencies to a graphical IDE (NetBeans can do it, for instance) which can to call graph, declaration-definition jumps, etc. or use LibClang and its wrapper for the text editor of your choice, it is also very good at indexing. At last, you can consider classics, ctags, which can link definition and declaration points.
There used to be a ctrace program that did just that but I don't think it is actively maintained.
Ultimately, it depends what exactly you are trying to achieve. It actually looks like you want to look up a specific function rather than trace it. If that is the case indeed, consider using some kind of source browser: from etags to cscope to OpenGrok.

How does PC-Lint (by Gimpel) look across multiple modules?

I'm using Gimpel's PC-Lint v8.00 on a C codebase and am looking to understand how it traverses modules. The PC-lint manual only goes as far as to say that PC-Lint "looks across multiple modules". How does it do this? For example, does it start with one module and combine all related include files and source files into one large piece of code to analyze? How deep does it search in order to understand the program flow?
In a second related question, I have a use case where it is beneficial for me to lint one C module from the codebase at a time instead of providing every C module in a long list to PC-Lint. However, if I only provide one C module, will it automatically find the other C modules which it depends on, and use those to understand the program flow of the specified C module?
PC Lint creates some sort of run-time database when it parses your source files, noting things like global variables, extern-declarations, etc.
When it has processed all compilation units (C files with all included files, recursively), it does what a linker does to generate your output, but in stead of generating code, it reports on certain types of errors, for instance: An extern-declaration that has not been used, an unused prototype without implementation, unused global functions. These are issues not always reported by the linker, since the code generation is very well possible: The items have never been used anywhere!
The search depth can be influenced by the option -passes, which enables a far better value-tracking at the cost of execution time. Refer to seciton 10.2.2.4 in the PDF manual (for version 9.x).
To your second question, no, if you only provide one (or a few) source (C) file name(s) on your Lint command line, PC Lint will process only that file - and all include files used, recursively. You may want to use the option -u for "unit-checkout" to tell PC Lint that it only processes a part of a full project. Lint will then suppress certain kinds of warnings not useful for a partial project.
I think in principle you're asking about LINT OBJECT MODULES, see Chapter 9 of Lint Manual PDF.
Using say lint -u a1.c -oo procudes the a1.lob, when then again can be linked together using lint *.lob to produce the inter-module messages.
Also you asked a related, specific questions ( Any tips for speeding up static analysis tool PC-Lint? Any experiences using .LOB files?) but I'm not sure if I understand your concern with "How much would you say it affected linting time?", because I would say it depends. What is your current lint-time / speed? You posted some years ago now, how about running the job on a novel machine, new cpu then? KR

Getting type information of C symbols

Let me try to give some background first. I'm working on some project with some micro controller (AVR) which I'm accessing through some interface (UART). I'm doing direct writes to its global variables and I'm also able to directly execute functions (write args, trigger execution, read back return values).
AVR code is in C compiled with GCC toolchain. PC, that is communicating with it, is running python code. As of now I have imported adress & size information into python easily by parsing 'objdump -x' output. Now what would greatly boost my development would be information about types of the symbols (types & sizes of structs elements, enums values, functions arguments & return values, ...).
Somehow this seemed like a common thing that people do daily, and I was naively expecting ready-made python tools at start. Well, not so easy. By now I've spend many hours looking into various ways how to accomplish that.
One approach would be to just parse the C code (using e.g. pycparser). But seems like I would have to at least 'pre-parse' the code to exclude various unsupported constructs and various ordering problems and so on. Also, in theory, the problem would be if compiler would do some optimizations, like struct or enum reordering and so on.
I've been also looking into various gcc, gdb and objdump options to get such information. Have spent some time looking for tools for extracting information from various debugging formats (dwarf, stabs).
The closest I get so far is to dump stabs debugging information with objdump -g option. This outputs C-like information, which I would then parse using pycparser or on my own.
But before I spent my time doing that, I decided to raise a question here, strongly hoping that someone will hit me with possibly totally different approach I just haven't think of.
There's a quite nice tool called c2ph that dumps a parsable descripton of the types and sizes (using debug info as the source)
To answer myself... this is what I found:
http://code.google.com/p/pydevtools/
Actually I knew about it before, but it didn't really work for me at first.
So basically I made it Python 3 compatible and did few other fixes/changes also - here you can get it all:
http://code.google.com/p/pydevtools/source/checkout
Actually there is some more code which actually uses this module, but it is not finished yet. I will probably add it when finished.

How to generate program dependence graph for C program?

I want to generate a Program Dependence Graph (PDG) from C source code. I found papers that explain how do it, but all used the commercial CodeSurfer tool.
Are there any free tools that do this?
Frama-C is an open-source static analysis framework that can compute a sound Program Dependency Graph for C programs. Its slicing plug-in uses the resulting PDG. The slicing and PDG computation were discussed in February 2010 on the mailing list (messages from jung, myung-jin and their answers).
You may also look at NIST's Unravel, or Georgia Tech's Aristotle. Both Valsoft at Karlsruhe University, and Loyola's Surgeon's Assistant, might also be worth looking into.
There's a promising new tool called cpp-depenencies.
It can generate component dependency diagrams (like below) as well as class hierarchy diagrams (by passing an option to treat each source file as a component).
Doxygen can generate function caller and callee graphs, as well as all the functions used in your program. This may not be exactly what you are looking for, but it could provide some useful data.
SourceMonitor is a metrics tool that can show function and program complexity as well as complexity diagrams.
Both tools are free.
If you use llvm to generate pdg, you can use this:
https://bitbucket.org/psu_soslab/program-dependence-graph-in-llvm/src/master/

Is it possible to write code to write code?

I've heard that there are some things one cannot do as a computer programmer, but I don't know what they are. One thing that occurred to me recently was: wouldn't it be nice to have a class that could make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself. Is it possible for code to write code?
If you want to learn about the limits of computability, read about the halting problem
In computability theory, the halting
problem is a decision problem which
can be stated as follows: given a
description of a program and a finite
input, decide whether the program
finishes running or will run forever,
given that input.
Alan Turing proved in 1936 that a
general algorithm to solve the halting problem for all
possible program-input pairs cannot exist
Start by looking at quines, then at Macro-Assemblers and then lex & yacc, and flex & bison. Then consider self-modifying code.
Here's a quine (formatted, use the output as the new input):
#include<stdio.h>
main()
{
char *a = "main(){char *a = %c%s%c; int b = '%c'; printf(a,b,a,b,b);}";
int b = '"';
printf(a,b,a,b,b);
}
Now if you're just looking for things programmers can't do look for the opposite of np-complete.
Sure it is. That's how a lot of viruses work!
Get your head around this: computability theory.
Yes, that's what most Lisp macros do (for just one example).
Yes it certainly is, though maybe not in the context you are referring to check out this post on t4.
If you look at Functional Programming that has many opportunities to write code that generates further code, the way that a language like Lisp doesn't differentiate between code and data is a significant part of it's power.
Rails generates the various default model and controller classes from the database schema when it's creating a new application. It's quite standard to do this kind of thing with dynamic languages- I have a few bits of PHP around that generate php files, just because it was the simplest solution to the problem I was dealing with at the time.
So it is possible. As for the question you are asking, though- that is perhaps a little vague- what environment and language are you using? What do you expect the code to do and why does it need to be added to? A concrete example may bring more directly relevant responses.
Yes it is possible to create code generators.
Most of the time they take user input and produce valid code. But there are other possibilities.
Self modifying programes are also possible. But they were more common in the dos era.
Of course you can! In fact, if you use a dynamic language, the class can change itself (or another class) while the program is still running. It can even create new classes that didn't exist before. This is called metaprogramming, and it lets your code become very flexible.
You are confusing/conflating two meanings of the word "write". One meaning is the physical writing of bytes to a medium, and the other is designing software. Of course you can have the program do the former, if it was designed to do so.
The only way for a program to do something that the programmer did not explicitly intend it to do, is to behave like a living creature: mutate (incorporate in itself bits of environment), and replicate different mutants at different rates (to avoid complete extinction, if a mutation is terminal).
Sure it is. I wrote an effect for Paint.NET* that gives you an editor and allows you to write a graphical effect "on the fly". When you pause typing it compiles it to a dll, loads it and executes it. Now, in the editor, you only need to write the actual render function, everything else necessary to create a dll is written by the editor and sent to the C# compiler.
You can download it free here: http://www.boltbait.com/pdn/codelab/
In fact, there is even an option to see all the code that was written for you before it is sent to the compiler. The help file (linked above) talks all about it.
The source code is available to download from that page as well.
*Paint.NET is a free image editor that you can download here: http://getpaint.net
In relation to artificial intelligence, take a look at Evolutionary algorithms.
make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself
You can also generate code, build it into a library instead of an executable, and then dynamically load the library without even exiting the program that is currently running.
Dynamic languages usually don't work quite as you suggest, in that they don't have a completely separate compilation step. It isn't necessary for a program to modify its own source code, recompile, and start from scratch. Typically the new functionality is compiled and linked in on the fly.
Common Lisp is a very good language to practice this in, but there are others where you can created code and run it then and there. Typically, this will be through a function called "eval" or something similar. Perl has an "eval" function, and it's generally common for scripting languages to have the ability.
There are a lot of programs that write other programs, such as yacc or bison, but they don't have the same dynamic quality you seem to be looking for.
Take a look at Langtom's loop. This is the simplest example of self-reproducing "program".
There is a whole class of such things called "Code Generators". (Although, a compiler also fits the description as you set it). And those describe the two areas of these beasts.
Most code generates, take some form of user input (most take a Database schema) and product source code which is then compiled.
More advanced ones can output executable code. With .NET, there's a whole namespace (System.CodeDom) dedicated to the create of executable code. The these objects, you can take C# (or another language) code, compile it, and link it into your currently running program.
I do this in PHP.
To persist settings for a class, I keep a local variable called $data. $data is just a dictionary/hashtable/assoc-array (depending on where you come from).
When you load the class, it includes a php file which basically defines data. When I save the class, it writes the PHP out for each value of data. It's a slow write process (and there are currently some concurrency issues) but it's faster than light to read. So much faster (and lighter) than using a database.
Something like this wouldn't work for all languages. It works for me in PHP because PHP is very much on-the-fly.
It has always been possible to write code generators. With XML technology, the use of code generators can be an essential tool. Suppose you work for a company that has to deal with XML files from other companies. It is relatively straightforward to write a program that uses the XML parser to parse the new XML file and write another program that has all the callback functions set up to read XML files of that format. You would still have to edit the new program to make it specific to your needs, but the development time when a new XML file (new structure, new names) is cut down a lot by using this type of code generator. In my opinion, this is part of the strength of XML technology.
Lisp lisp lisp lisp :p
Joking, if you want code that generates code to run and you got time to loose learning it and breaking your mind with recursive stuff generating more code, try to learn lisp :)
(eval '(or true false))
wouldn't it be nice to have a class that could make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself
There are almost no cases where that would solve a problem that cannot be solved "better" using non-self-modifying code..
That said, there are some very common (useful) cases of code writing other code.. The most obvious being any server-side web-application, which generates HTML/Javascript (well, HTML is markup, but it's identical in theory). Also any script that alters a terminals environment usually outputs a shell script that is eval'd by the parent shell. wxGlade generates code to that creates bare-bone wx-based GUIs.
See our DMS Software Reengineering Toolkit. This is general purpose machinery to read and modify programs, or generate programs by assembling fragments.
This is one of the fundamental questions of Artificial Intelligence. Personally I hope it is not possible - otherwise soon I'll be out of a job!!! :)
It is called meta-programming and is both a nice way of writing useful programs, and an interesting research topic. Jacques Pitrat's Artificial Beings: the conscience of a conscious machine book should interest you a lot. It is mostly related to meta-knowledge based computer programs.
Another related term is multi-staged programming (because there are several stages of programs, each generating the next one).

Resources