How to methodically trace the location of source code - c

I often spend lots of time trying to find out where the exact implementation is located. It gets very frustrating when dealing with some low-level code that might end up somewhere in kernel.
I usually just google or try to guess the location and/or method names, but it is not always very effective.
Is there some methodical way to trace the flow up to the implementation? How do you guys usually do it?

Load the whole the code with relevant dependencies to a graphical IDE (NetBeans can do it, for instance) which can to call graph, declaration-definition jumps, etc. or use LibClang and its wrapper for the text editor of your choice, it is also very good at indexing. At last, you can consider classics, ctags, which can link definition and declaration points.

There used to be a ctrace program that did just that but I don't think it is actively maintained.
Ultimately, it depends what exactly you are trying to achieve. It actually looks like you want to look up a specific function rather than trace it. If that is the case indeed, consider using some kind of source browser: from etags to cscope to OpenGrok.

Related

What tools/IDE/languages exists for generating C-code

I wanted to know more about tools that assist in writing code-generators for C. Essentially how do we achieve functionality similar to c++ templates.
Even though it's not a perfect solution and it takes some time to master it, I've used the m4 macro processor in the past for generic C code generation (kinda like C++ templates). You may want to check that out.
You are looking for a way to generate code for very similar classes, where what differs is essentially their type.
You can use a template-based code generator, where "template" means "boilerplate code" with string substitution. This is the simplest scenario. A tool like StringTemplate or CodeSmith will do the job. But there are many others. Just search around.
If you want a more serious generation scenario, where different class structures might be needed according to a set of definitions, then you should go with a fully programmable generator like AtomWeaver. There are others (MPS, Xtext) but these do not rely on templates.

Spaghetti code visualisation software?

a smoking pile of spaghetti just landed on my desk, and my task is to understand it (so I can refactor / reimplement it).
The code is C, and a mess of global variables, structure types and function calls.
I would like to plot graphs of the code with the information:
- Call graph
- Which struct types are used in which functions
- Which global variable is used in what function
Hopefully this would make it easier to identify connected components, and extract them to separate modules.
I have tried the following software for similar purposes:
- ncc
- ctags
- codeviz / gengraph
- doxygen
- egypt
- cflow
EDIT2:
- frama-c
- snavigator
- Understand
The shortcomings of these are either
a) requires me to be able to compile the code. My code does not compile, since portions of the source code is missing.
b) issues with preprocessor macros (like cflow, who wants to execute both branches of #if statements). Running it through cpp would mess up the line numbers.
c) I for some reason do not manage to get the software to do what I want to do (like doxygen; the documentation for call graph generation is not easy to find, and since it does not seem to plot variables/data types anyway, it is probably not worth spending more time learning about doxygen's config options). EDIT: I did follow a these Doxygen instrcutions, but it did only plot header file dependencies.
I am on Linux, so it is a huge plus if the software is for linux, and free software. Not sure my boss understands the need to buy a visualizer :-(
For example: a command line tool that lists in which functions a symbol (=function,variable,type) is referenced in would be of great help (like addr2line, but for types/variable names/functions and source code).
//T
My vote goes to gnu global. It has all the features of ctags/cscope combined as well as the possibility to generate fully indexed html which allows you to browse the code in your favorite browser. Fire it up in apache and you have a web-service that anyone can access including full search capabilities.
It integrates nicely into emacs/vim/even the bash-shell, and you can use it directly from the shell-prompt.
To see it in action on the linux kernel, visit this
Combine that with a tool for cyclomatic complexity plugin for eclipse which calculates the complexity of your code. besides the cyclomatic complexity it can handle:
McCabe's Cyclomatic Complexity
Efferent Couplings
Lack of Cohesion in Methods
Lines Of Code in Method
Number Of Fields
Number Of Levels
Number Of Locals In Scope
Number Of Parameters
Number Of Statements
Weighted Methods Per Class
...and you should have everything you need.
If you like command line ;) maybe you could try cscope, it does static analysis of code and can tell you where are referenced some symbols/variables/functions... Not the Holy Graal, but it can be pretty usefull to browse unknown source code.
There are also some GUI that can handle csope results (Vi, Emacs, JEdit...).
On the other hand, Eclipse with the CDT plugin can also help you to navigate into the spaghetti code you have to maintain.
It's not free and afaik not linux but cppDepend might be worth evaluating - at least until someone comes up with a more suitable suggestion :)
http://www.cppdepend.com/ [Demo video here]
If you'd like to know in which functions a symbol is declared or referenced you can try LXR. It's not console based, but is quite usable.

Is there an easy way to find which other functions can call a certain function from the source code?

I have a function which is called explicitly by 4 other functions in my code base. Then in turn each of these functions is called by at least 10 other functions throughout my code. I know that I could, by hand, trace one of these function calls to the main function of my program (which has 30 function calls) but it seems like this would be a better job for the computer. I just want to know which of the functions in main() is calling this buried function.
Does anyone know of any software that could help?
Also, using a debugger is out of the question. That would have been too easy. The software only runs on a hand held device.
doxygen, correctly configured, is able to output an HTML document with navigable caller list and called-by list for every function in your code. You can generate call graphs as well.
Comment it out (or better, comment out its prototype) and try to compile your program. You should see, where it is referenced.
If your platform has an API to capture backtraces, I would just instrument up the function to use those and log them to a file for later analysis. There's no guarantee that this will find all callers (or callers-of-...-of-callers), but if you exercise all of the programs features while logging like this, you should find "most" of them. For relatively simple programs, it is possible to find all callers this way.
Alternatively, many sampling tools can get you this information.
However, I have a suspicion that you may be on a platform that doesn't have a lot of these features, so a static source-analysis tool (like mouviciel suggested) is likely your best option. Assuming that you can make it work for you, this has the added benefit that it should find all callers, not just most of them.
http://cscope.sourceforge.net/ I think this also can be useful.
I second mouviciel's suggestion of using doxygen for getting this info. The downside is that doxygen is working on the source code. You can only see what functions CAN POTENTIALLY call your function, not the ones that are ACTUALLY CALLING your function. If you are using Linux and you can change the source code of the function in question, you can obtain this info using the backtrace() and the backtrace_symbols() functions.

Is it possible to write code to write code?

I've heard that there are some things one cannot do as a computer programmer, but I don't know what they are. One thing that occurred to me recently was: wouldn't it be nice to have a class that could make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself. Is it possible for code to write code?
If you want to learn about the limits of computability, read about the halting problem
In computability theory, the halting
problem is a decision problem which
can be stated as follows: given a
description of a program and a finite
input, decide whether the program
finishes running or will run forever,
given that input.
Alan Turing proved in 1936 that a
general algorithm to solve the halting problem for all
possible program-input pairs cannot exist
Start by looking at quines, then at Macro-Assemblers and then lex & yacc, and flex & bison. Then consider self-modifying code.
Here's a quine (formatted, use the output as the new input):
#include<stdio.h>
main()
{
char *a = "main(){char *a = %c%s%c; int b = '%c'; printf(a,b,a,b,b);}";
int b = '"';
printf(a,b,a,b,b);
}
Now if you're just looking for things programmers can't do look for the opposite of np-complete.
Sure it is. That's how a lot of viruses work!
Get your head around this: computability theory.
Yes, that's what most Lisp macros do (for just one example).
Yes it certainly is, though maybe not in the context you are referring to check out this post on t4.
If you look at Functional Programming that has many opportunities to write code that generates further code, the way that a language like Lisp doesn't differentiate between code and data is a significant part of it's power.
Rails generates the various default model and controller classes from the database schema when it's creating a new application. It's quite standard to do this kind of thing with dynamic languages- I have a few bits of PHP around that generate php files, just because it was the simplest solution to the problem I was dealing with at the time.
So it is possible. As for the question you are asking, though- that is perhaps a little vague- what environment and language are you using? What do you expect the code to do and why does it need to be added to? A concrete example may bring more directly relevant responses.
Yes it is possible to create code generators.
Most of the time they take user input and produce valid code. But there are other possibilities.
Self modifying programes are also possible. But they were more common in the dos era.
Of course you can! In fact, if you use a dynamic language, the class can change itself (or another class) while the program is still running. It can even create new classes that didn't exist before. This is called metaprogramming, and it lets your code become very flexible.
You are confusing/conflating two meanings of the word "write". One meaning is the physical writing of bytes to a medium, and the other is designing software. Of course you can have the program do the former, if it was designed to do so.
The only way for a program to do something that the programmer did not explicitly intend it to do, is to behave like a living creature: mutate (incorporate in itself bits of environment), and replicate different mutants at different rates (to avoid complete extinction, if a mutation is terminal).
Sure it is. I wrote an effect for Paint.NET* that gives you an editor and allows you to write a graphical effect "on the fly". When you pause typing it compiles it to a dll, loads it and executes it. Now, in the editor, you only need to write the actual render function, everything else necessary to create a dll is written by the editor and sent to the C# compiler.
You can download it free here: http://www.boltbait.com/pdn/codelab/
In fact, there is even an option to see all the code that was written for you before it is sent to the compiler. The help file (linked above) talks all about it.
The source code is available to download from that page as well.
*Paint.NET is a free image editor that you can download here: http://getpaint.net
In relation to artificial intelligence, take a look at Evolutionary algorithms.
make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself
You can also generate code, build it into a library instead of an executable, and then dynamically load the library without even exiting the program that is currently running.
Dynamic languages usually don't work quite as you suggest, in that they don't have a completely separate compilation step. It isn't necessary for a program to modify its own source code, recompile, and start from scratch. Typically the new functionality is compiled and linked in on the fly.
Common Lisp is a very good language to practice this in, but there are others where you can created code and run it then and there. Typically, this will be through a function called "eval" or something similar. Perl has an "eval" function, and it's generally common for scripting languages to have the ability.
There are a lot of programs that write other programs, such as yacc or bison, but they don't have the same dynamic quality you seem to be looking for.
Take a look at Langtom's loop. This is the simplest example of self-reproducing "program".
There is a whole class of such things called "Code Generators". (Although, a compiler also fits the description as you set it). And those describe the two areas of these beasts.
Most code generates, take some form of user input (most take a Database schema) and product source code which is then compiled.
More advanced ones can output executable code. With .NET, there's a whole namespace (System.CodeDom) dedicated to the create of executable code. The these objects, you can take C# (or another language) code, compile it, and link it into your currently running program.
I do this in PHP.
To persist settings for a class, I keep a local variable called $data. $data is just a dictionary/hashtable/assoc-array (depending on where you come from).
When you load the class, it includes a php file which basically defines data. When I save the class, it writes the PHP out for each value of data. It's a slow write process (and there are currently some concurrency issues) but it's faster than light to read. So much faster (and lighter) than using a database.
Something like this wouldn't work for all languages. It works for me in PHP because PHP is very much on-the-fly.
It has always been possible to write code generators. With XML technology, the use of code generators can be an essential tool. Suppose you work for a company that has to deal with XML files from other companies. It is relatively straightforward to write a program that uses the XML parser to parse the new XML file and write another program that has all the callback functions set up to read XML files of that format. You would still have to edit the new program to make it specific to your needs, but the development time when a new XML file (new structure, new names) is cut down a lot by using this type of code generator. In my opinion, this is part of the strength of XML technology.
Lisp lisp lisp lisp :p
Joking, if you want code that generates code to run and you got time to loose learning it and breaking your mind with recursive stuff generating more code, try to learn lisp :)
(eval '(or true false))
wouldn't it be nice to have a class that could make a copy of the source of the program it runs, modify that program and add a method to the class that it is, and then run the copy of the program and terminate itself
There are almost no cases where that would solve a problem that cannot be solved "better" using non-self-modifying code..
That said, there are some very common (useful) cases of code writing other code.. The most obvious being any server-side web-application, which generates HTML/Javascript (well, HTML is markup, but it's identical in theory). Also any script that alters a terminals environment usually outputs a shell script that is eval'd by the parent shell. wxGlade generates code to that creates bare-bone wx-based GUIs.
See our DMS Software Reengineering Toolkit. This is general purpose machinery to read and modify programs, or generate programs by assembling fragments.
This is one of the fundamental questions of Artificial Intelligence. Personally I hope it is not possible - otherwise soon I'll be out of a job!!! :)
It is called meta-programming and is both a nice way of writing useful programs, and an interesting research topic. Jacques Pitrat's Artificial Beings: the conscience of a conscious machine book should interest you a lot. It is mostly related to meta-knowledge based computer programs.
Another related term is multi-staged programming (because there are several stages of programs, each generating the next one).

Code Obfuscation?

So, I have a penchant for Easter Eggs... this dates back to me being part of the found community of the Easter Egg Archive.
However, I also do a lot of open source programming.
What I want to know is, what do you think is the best way to SYSTEMATICALLY and METHODICALLY obfuscate code.
Examples in PHP/Python/C/C++ preferred, but in other languages is fine, if the methodology is explained properly.
Compile the code with full optimization. Completely strip the binary.
Use a decompiler on the code.
I can guarantee the result will be so utterly unreadable that you won't even be able to read it ;)
In that case, you should use/write an "obfuscator". A program that does the job for you.
The Salamander Obfuscator can be used to obfuscate .Net programs, but it is more to prevent decompilation, thus not exactly what you need.
A good place to learn about obfuscation in C is International Obfuscated C Code Contest
In the spirit of renaming symbols: overuse scope and visibility rules by naming different variables with the same name.
The question is how to create seemingly non-obfuscated code in plain sight (open source) without it appearing to perform another function.
Some obvious methods:
remove comments and as much whitespace as you can without breaking things
join lines
rename variables and functions to be meaningless (preferably 1 character)
For systematic and methodical obfuscation of code, you cannot beat Perl. If you want something that compiles to a binary, there is always APL.
If you are targeting the .NET framework, put your easter egg source code in a resource file as a binhex string. Then you can have one of your initialisaing routines fetch it, decode it and compile it into memory. You can invoke it using reflection.
If you need help with the technical aspects of compiling into memory and calling into the resultant assembly I can give you I library I wrote and a sample program that uses it.
You can use this technology to load plug-ins, which is a legit thing to do and reasonable in an initialiser.

Resources