How does PC-Lint (by Gimpel) look across multiple modules? - c

I'm using Gimpel's PC-Lint v8.00 on a C codebase and am looking to understand how it traverses modules. The PC-lint manual only goes as far as to say that PC-Lint "looks across multiple modules". How does it do this? For example, does it start with one module and combine all related include files and source files into one large piece of code to analyze? How deep does it search in order to understand the program flow?
In a second related question, I have a use case where it is beneficial for me to lint one C module from the codebase at a time instead of providing every C module in a long list to PC-Lint. However, if I only provide one C module, will it automatically find the other C modules which it depends on, and use those to understand the program flow of the specified C module?

PC Lint creates some sort of run-time database when it parses your source files, noting things like global variables, extern-declarations, etc.
When it has processed all compilation units (C files with all included files, recursively), it does what a linker does to generate your output, but in stead of generating code, it reports on certain types of errors, for instance: An extern-declaration that has not been used, an unused prototype without implementation, unused global functions. These are issues not always reported by the linker, since the code generation is very well possible: The items have never been used anywhere!
The search depth can be influenced by the option -passes, which enables a far better value-tracking at the cost of execution time. Refer to seciton 10.2.2.4 in the PDF manual (for version 9.x).
To your second question, no, if you only provide one (or a few) source (C) file name(s) on your Lint command line, PC Lint will process only that file - and all include files used, recursively. You may want to use the option -u for "unit-checkout" to tell PC Lint that it only processes a part of a full project. Lint will then suppress certain kinds of warnings not useful for a partial project.

I think in principle you're asking about LINT OBJECT MODULES, see Chapter 9 of Lint Manual PDF.
Using say lint -u a1.c -oo procudes the a1.lob, when then again can be linked together using lint *.lob to produce the inter-module messages.
Also you asked a related, specific questions ( Any tips for speeding up static analysis tool PC-Lint? Any experiences using .LOB files?) but I'm not sure if I understand your concern with "How much would you say it affected linting time?", because I would say it depends. What is your current lint-time / speed? You posted some years ago now, how about running the job on a novel machine, new cpu then? KR

Related

How to automatically merge C source files?

I have a single executable which consist of many .c source files across several directories.
Currently I need to run static analysis on the whole source code, not on each files separately.
I just found gcc ʟᴛᴏ (link time optimisation) works by compressing gimple which mirror the preprocessed source.
Also when the compiler crash during ʟᴛᴏ linking phase, it asks for sending preprocessed sources for the bug report.
By merging source files, I mean combining all the files used for creating the executable into a single file. Compiling and linking that single file would create the library, resulting in doing manually what ʟᴛᴏ does. (but it’s not the aim here. static analysers don’t support things like ɪᴘᴏ/ʟᴛᴏ)
Doing this manually will definitely takes hours…
So, is there a way to merge C source files automatically ? Or at least to get ʟᴛᴏ preprocessed sources ? (it seems thesave-tempsoption does nothing interesting during linking with ʟᴛᴏ)
CIL (C Intermediate Language) has a 'merger' feature which I've successfully used for some simple merge operations.
I've used it to merge moderately complicated programs - around a hundred files in different folders. Of course, if your codebase includes any C++ the CIL merger won't be able to merge that.
No, because for example two files might have conflicting static declarations. There are other ways that moving source code into a single file might make it stop working, and diagnosing every possible one would require solving the Halting Problem. (Does an arbitrary program ever use the result of __FILE__ in such a way that it fails if two sections of code are in the same file?) File-scope declarations are the most likely to occur in the real world, though.
That said, you can try just concatenating the files and seeing what error messages you get. Most headers should keep working if you #include them twice. A conflicting identifier name can be fixed by a search-and replace in the original files.

Extract just the required functions from a C code project?

How can I extract just the required functions from a pile of C source files? Is there a tool which can be used on GNU/Linux?
Preferably FOSS, but the GNU/Linux is a hard requirement.
Basically I got about 10 .h files; I'd like to grab part of the code and get the required variables from the header files. Then I can make a single small .h file corresponding to the code I'm using in another project.
My terms might not be 100% correct.
One tool that you may or may not be aware of is cscope. It can be used to help you.
For a given set of files (more on what that means shortly), it gives you these options:
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
Thus, if you know you want to use a function humungous_frogmondifier(), you can find where it is declared or defined by typing its name (or pasting its name) after 'Find this global definition'. If you then want to know what functions it calls, you use the next line. Once you've hit return after specifying the name, you will be given a list of the relevant lines in the source files above this menu on the screen. You can page through the list (if there are more entries than will fit on the screen), and at any time select one of the shown entries by number or letter, in which case cscope launches your editor on the file.
How about that list of files? If you run cscope in a directory without any setup, it will scan the source files in the directory and build its cross-reference. However, if you prefer, you can set up a list of files names in cscope.files and it will analyze those files instead. You can also include -I /path/to/directory on the cscope command line and it will find referenced headers in those directories too.
I'm using cscope 15.7a on some sizeable projects - depending on which version of the project, between about 21,000 and 25,000 files (and some smaller ones with only 10-15 thousand files). It takes about half an hour to set up this project (so I carefully rebuild the indexes once per night, and use the files for the day, accepting that they are a little less accurate at the end of the day). It allows me to track down unused stuff, and find out where stuff is used, and so on.
If you're used to an IDE, it will be primitive. If you're used to curses-mode programs (vim, etc), then it is tolerably friendly.
You suggest (in comments to the main question) that you will be doing this more than once, possibly on different (non-library) code bases. I'm not sure I see the big value in this; I've been coding C on an off for 30+ years and don't feel the need to do this very often.
But given the assumption you will, what you really want is a tool that can, for a given identifier in a system of C files and headers, find the definition of that identifier in those files, and compute the transitive closure of all the dependencies which it has. This defines a partial order over the definitions based on the depends-on relationship. Finally you want to emit the code for those definitions to an output file, in a linear order that honors the partial order determined. (You can simplify this a bit by insisting that the identifier you want is in a particular C compilation unit, but the rest of it stays the same).
Our DMS Software Reengineering Toolkit with its C Front End can be used to do this. DMS is a general purpose program transformation system, capable of parsing source files into ASTs, perform full name resolution (e.g., building symbol tables), [do flow analysis but this isn't needed for your task]. Given those ASTs and the symbol tables, it can be configured to compute this transitive dependency using the symbol table information which record where symbols are defined in the ASTs. Finally, it can be configured to assemble the ASTs of interest into a linear order honoring the partial order.
We have done all this with DMS in the past, where the problem was to generate SOA-like interfaces based on other criteria; after generating the SOA code, the tool picked out all the dependencies for the SOA code and did exactly what was required. The dependency extraction machinery is part of the C front end.
A complication for the C world is that the preprocessor may get in the way; for the particular task we accomplished, the extraction was done over a specific configuration of the application and so the preprocessor directives were all expanded away. If you want this done and retain the C preprocessor directives, you'll need something beyond what DMS can do today. (We do have experimental work that captures macros and preprocessor conditionals in the AST but that's not ready for release to production).
You'd think this problem would be harder with C++ but it is not, because the prepreprocessor is used far more lightly in C++ programs. While we have not done extraction for C++, it would follow exactly the same approach as for C.
So that's the good part with respect to your question.
The not so good part from your point of view, perhaps, is that DMS isn't FOSS; it is a commercial tool designed to be used by my company and our customers to build custom analysis and transformation tools for all those tasks you can't get off the shelf, that make economic sense. Nor does DMS run natively on Linux, rather it is a Windows based tool. It can reach across the network using NFS to access files on other systems including Linux. DMS does run under Wine on Linux.

Spaghetti code visualisation software?

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T
My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.
If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.
It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]
If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

Any good reason to #include source (*.c *.cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

splint whole program with a complex build process

I want to run splints whole program analysis on my system. However the system is quite large and different parts are compiled with different compiler defines and include paths. I can see how to convey this information to splint for a single file but I can't figure out how to do it for whole program. Does anyone know a way of doing this?
Assuming you have a Makefile you could create a new target; then you would go through the actual compilation steps to duplicate them using Splint instead of the compiler.
My advice, however, is against the full-program approach. If you can isolate your system into separate parts, I'd rather start by checking them, one by one. Since your program is "quite large", expect a gazillion warnings... for each one of your modules. You will start to get rid of them once you have sprinkled your source code with the appropriate semantic annotations. Good luck! :)

Resources