Getting type information of C symbols - c

Let me try to give some background first. I'm working on some project with some micro controller (AVR) which I'm accessing through some interface (UART). I'm doing direct writes to its global variables and I'm also able to directly execute functions (write args, trigger execution, read back return values).
AVR code is in C compiled with GCC toolchain. PC, that is communicating with it, is running python code. As of now I have imported adress & size information into python easily by parsing 'objdump -x' output. Now what would greatly boost my development would be information about types of the symbols (types & sizes of structs elements, enums values, functions arguments & return values, ...).
Somehow this seemed like a common thing that people do daily, and I was naively expecting ready-made python tools at start. Well, not so easy. By now I've spend many hours looking into various ways how to accomplish that.
One approach would be to just parse the C code (using e.g. pycparser). But seems like I would have to at least 'pre-parse' the code to exclude various unsupported constructs and various ordering problems and so on. Also, in theory, the problem would be if compiler would do some optimizations, like struct or enum reordering and so on.
I've been also looking into various gcc, gdb and objdump options to get such information. Have spent some time looking for tools for extracting information from various debugging formats (dwarf, stabs).
The closest I get so far is to dump stabs debugging information with objdump -g option. This outputs C-like information, which I would then parse using pycparser or on my own.
But before I spent my time doing that, I decided to raise a question here, strongly hoping that someone will hit me with possibly totally different approach I just haven't think of.

There's a quite nice tool called c2ph that dumps a parsable descripton of the types and sizes (using debug info as the source)

To answer myself... this is what I found:
http://code.google.com/p/pydevtools/
Actually I knew about it before, but it didn't really work for me at first.
So basically I made it Python 3 compatible and did few other fixes/changes also - here you can get it all:
http://code.google.com/p/pydevtools/source/checkout
Actually there is some more code which actually uses this module, but it is not finished yet. I will probably add it when finished.

Related

gcc generating a list of function and variable names

I am looking for a way to get a list of all the function and variable names for a set of c source files. I know that gcc breaks down those elements when compiling and linking so is there a way to piggyback that process? Or any other tool that could do the same thing?
EDIT: It's mostly because I am curious, I have been playing with things like make auto dependency and graphing include trees and would like to be able to get more stats on the source files. And it seems like something that would already exist but i haven't found any options or flags for it.
If you are only interested by names of global functions and variables, you might (assuming you are on Linux) use the nm or objdump utilities on the ELF binary executable or object files.
Otherwise, you might customize the GCC compiler (assuming you have a recent version, e.g. 5.3 or 6 at least) thru plugins. You could code them directly in C++, or you might consider using GCC MELT, a Lisp-like domain specific language to customize GCC. Perhaps even the findgimple mode of GCC MELT might be enough....
If you consider extending GCC, be aware that you'll need to spend a significant time (perhaps months) understanding its internal representations (notably Generic Trees & Gimple) in details. The links and slides on GCC MELT documentation page might be useful.
Your main issue is that you probably need to understand most of the details about GCC internal representations, and that takes time!
Also, the details of GCC internals are slightly changing from one version of GCC to the next one.
You could also consider (instead of working inside GCC) using the Clang/LLVM framework (but learning that is also a lot of time). Maybe you might also look into Frama-C or Coccinnelle.
Another approach might be to compile with debug info and parse DWARF information.
PS. My point is that your problem is probably much more difficult than what you believe. Parsing C is not that simple ... You might spend months or even years working on that... And details could be target-processor & system & compiler specific...

What files need to be modified to compile for a custom architecture of an existing cpu with gcc?

I've been looking at examples of C code that is compiled for some lesser known processors (like ZPU) using the gcc cross compiler.
Most of the working examples I see assume a certain arquitecture (Memory map and set of peripherals) and simply give you a recipe to compile for these and they work.
However I can find very little information on what needs to modified if you use the same cpu with a different memory map and set of peripherals.
From what I've read. There are two main files that I need to make sure that are done "right". The linker script that is used and the crt0.o (Which if I need to modify means recompiling the crt0.S which is assembler). On this last one, especially I find very little information on what is actually supposed to do (other that setting up reset there is no clear info, and I'm talking conceptually not for an specific processor. Although something for this would also be useful).
Can any one tell me what is the relationship between a the c files for the code of program (bare metal development), the crt0.S (specially why it is needed) and it's relationship with a working linker script?
PD: Answers of the form "read this book" are welcome and I would love them.
PD: I realize this kind of question is usually vague and closed quickly but I don't know where else to turn, so I ask for a bit of leniency.

Spaghetti code visualisation software?

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T
My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.
If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.
It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]
If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

splint whole program with a complex build process

I want to run splints whole program analysis on my system. However the system is quite large and different parts are compiled with different compiler defines and include paths. I can see how to convey this information to splint for a single file but I can't figure out how to do it for whole program. Does anyone know a way of doing this?
Assuming you have a Makefile you could create a new target; then you would go through the actual compilation steps to duplicate them using Splint instead of the compiler.
My advice, however, is against the full-program approach. If you can isolate your system into separate parts, I'd rather start by checking them, one by one. Since your program is "quite large", expect a gazillion warnings... for each one of your modules. You will start to get rid of them once you have sprinkled your source code with the appropriate semantic annotations. Good luck! :)

Optimized code on Unix?

What is the best and easiest method to debug optimized code on Unix which is written in C?
Sometimes we also don't have the code for building an unoptimized library.
This is a very good question. I had similar difficulties in the past where I had to integrate 3rd party tools inside my application. From my experience, you need to have at least meaningful callstacks in the associated symbol files. This is merely a list of addresses and associated function names. These are usually stripped away and from the binary alone you won't get them... If you have these symbol files you can load them while starting gdb or afterward by adding them. If not, you are stuck at the assembly level...
One weird behavior: even if you have the source code, it'll jump forth and back at places where you would not expect (statements may be re-ordered for better performance) or variables don't exist anymore (optimized away!), setting breakpoints in inlined functions is pointless (they are not there but part of the place where they are inlined). So even with source code, watch out these pitfalls.
I forgot to mention, the symbol files usually have the extension .gdb, but it can be different...
This question is not unlike "what is the best way to fix a passenger car?"
The best way to debug optimized code on UNIX depends on exactly which UNIX you have, what tools you have available, and what kind of problem you are trying to debug.
Debugging a crash in malloc is very different from debugging an unresolved symbol at runtime.
For general debugging techniques, I recommend this book.
Several things will make it easier to debug at the "assembly level":
You should know the calling
convention for your platform, so you
can tell what values are being passed
in and returned, where to find the
this pointer, which registers are "caller saved" and which are "callee saved", etc.
You should know your OS "calling convention" -- what a system call looks like, which register a syscall number goes into, the first parameter, etc.
You should
"master" the debugger: know how to
find threads, how to stop individual
threads, how to set a conditional
breakpoint on individual instruction, single-step, step into or skip over function calls,
etc.
It often helps to debug a working program and a broken program "in parallel". If version 1.1 works and version 1.2 doesn't, where do they diverge with respect to a particular API? Start both programs under debugger, set breakpoints on the same set of functions, run both programs and observe differences in which breakpoints are hit, and what parameters are passed.
Write small code samples by the same interfaces (something in its header), and call your samples instead of that optimized code, say simulation, to narrow down the code scope which you debug. Furthermore you are able to do error enjection in your samples.

Resources