Getting Control Flow Graph from ANSI C code - c

I'm building tool for testing ansi c applications. Simply load code, view control flow graph, run test, mark all vertexes which was hit. I'm trying to build CFG all by myself from parsing code. Unfortunately It gets messed up if code is nested. GCC gives ability to get CFG from compiled code. I might write parser for its output, but I need line numbers for setting breakpoints. Is there way for getting line numbers when outputting Control Flow Graph with -fdump-tree-cfg or -fdump-tree-vcg?

For the control flow graph of a C Program you could look at existing Python parsers for C:
PyCParser
pycparser
pyclibrary (fork of pyclibrary )
joern
CoFlo C/C++ control flow graph generator and analyzer
Call graphs are a closely related construct to control flow graphs.
There are several approaches available to create call graphs (function dependencies) for C code.
This might prove of help for progressing with control flow graph generation.
Ways to create dependency graphs in C:
Using cflow:
cflow +pycflow2dot +dot (GPL, BSD) cflow is robust, because it can handle code which cannot compile, e.g. missing includes. If preprocessor directives are heavily used, it may need the --cpp option to preprocess the code.
cflow + cflow2dot + dot (GPL v2, GPL v3, Eclipse Public License (EPL) v1) (note that cflow2dot needs some path fixing before it works)
cflow +cflow2dot.bash (GPL v2, ?)
cflow +cflow2vcg (GPL v2 , GPL v2)
enhanced cflow (GPL v2) with list to exclude symbols from graph
Using cscope:
cscope (BSD)
cscope +callgraphviz +dot +xdot
cscope +vim CCTree (C Call-Tree Explorer)
cscope +ccglue
cscope +CodeQuery for C, C++, Python & Java
cscope +Python html producer
cscope +calltree.sh
ncc (cflow like)
KCachegrind (KDE dependency viewer)
Calltree
The following tools unfortunately require that the code be compilable, because they depend on output from gcc:
CodeViz (GPL v2) (weak point: needs compilable source, because it uses gcc to dump cdepn files)
gcc +egypt +dot (GPL v*, Perl = GPL | Artistic license, EPL v1) (egypt uses gcc to produce RTL, so fails for any buggy source code, or even in case you just want to focus on a single file from a larger project. Therefore, it is not very useful compared to the more robust cflow-based toolchains. Note that egypt has by default good support for excluding library calls from the graph, to make it cleaner.
Also, file dependency graphs for C/C++ can be created with crowfood.

So I've made some more research and it is not hard to get line numbers for nodes. Just add lineno option to one of those options to get it. So use -fdump-tree-cfg-lineno or -fdump-tree-vcg-lineno. It took me some time to check if those numbers are reliable. In case of graph in VCG format label of each node contains two numbers. Those are line numbers for start and end of code portion represented by this node.

Dynamic analysis methods
In this answer I describe a few dynamic analysis methods.
Dynamic methods actually run the program to determine the call graph.
The opposite of dynamic methods are static methods, which try to determine it from the source alone without running the program.
Advantages of dynamic methods:
catches function pointers and virtual C++ calls. These are present in large numbers in any non-trivial software.
Disadvantages of dynamic methods:
you have to run the program, which might be slow, or require a setup that you don't have, e.g. cross-compilation
only functions that were actually called will show. E.g., some functions could be called or not depending on the command line arguments.
KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
f0, f1 and f2 are called as expected from one another
pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind manages to do that by running your code through a lightweight "virtual machine".
Tested on Ubuntu 18.04.
gcc -finstrument-functions + etrace
https://github.com/elcritch/etrace
-finstrument-functions adds callbacks, etrace parses the ELF file and implements all callbacks.
I couldn't get it working however unfortunately: Why doesn't `-finstrument-functions` work for me?
Claimed output is of format:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
Likely the most efficient method besides specific hardware tracing support, but has the downside that you have to recompile the code.

Related

How can execute a decrypted file residing in the memory? [duplicate]

Is it possible to compile a C++ (or the like) program without generating the executable file but writing it and executing it directly from memory?
For example with GCC and clang, something that has a similar effect to:
c++ hello.cpp -o hello.x && ./hello.x $# && rm -f hello.x
In the command line.
But without the burden of writing an executable to disk to immediately load/rerun it.
(If possible, the procedure may not use disk space or at least not space in the current directory which might be read-only).
Possible? Not the way you seem to wish. The task has two parts:
1) How to get the binary into memory
When we specify /dev/stdout as output file in Linux we can then pipe into our program x0 that reads
an executable from stdin and executes it:
gcc -pipe YourFiles1.cpp YourFile2.cpp -o/dev/stdout -Wall | ./x0
In x0 we can just read from stdin until reaching the end of the file:
int main(int argc, const char ** argv)
{
const int stdin = 0;
size_t ntotal = 0;
char * buf = 0;
while(true)
{
/* increasing buffer size dynamically since we do not know how many bytes to read */
buf = (char*)realloc(buf, ntotal+4096*sizeof(char));
int nread = read(stdin, buf+ntotal, 4096);
if (nread<0) break;
ntotal += nread;
}
memexec(buf, ntotal, argv);
}
It would also be possible for x0 directly execute the compiler and read the output. This question has been answered here: Redirecting exec output to a buffer or file
Caveat: I just figured out that for some strange reason this does not work when I use pipe | but works when I use the x0 < foo.
Note: If you are willing to modify your compiler or you do JIT like LLVM, clang and other frameworks you could directly generate executable code. However for the rest of this discussion I assume you want to use an existing compiler.
Note: Execution via temporary file
Other programs such as UPX achieve a similar behavior by executing a temporary file, this is easier and more portable than the approach outlined below. On systems where /tmp is mapped to a RAM disk for example typical servers, the temporary file will be memory based anyway.
#include<cstring> // size_t
#include <fcntl.h>
#include <stdio.h> // perror
#include <stdlib.h> // mkostemp
#include <sys/stat.h> // O_WRONLY
#include <unistd.h> // read
int memexec(void * exe, size_t exe_size, const char * argv)
{
/* random temporary file name in /tmp */
char name[15] = "/tmp/fooXXXXXX";
/* creates temporary file, returns writeable file descriptor */
int fd_wr = mkostemp(name, O_WRONLY);
/* makes file executable and readonly */
chmod(name, S_IRUSR | S_IXUSR);
/* creates read-only file descriptor before deleting the file */
int fd_ro = open(name, O_RDONLY);
/* removes file from file system, kernel buffers content in memory until all fd closed */
unlink(name);
/* writes executable to file */
write(fd_wr, exe, exe_size);
/* fexecve will not work as long as there in a open writeable file descriptor */
close(fd_wr);
char *const newenviron[] = { NULL };
/* -fpermissive */
fexecve(fd_ro, argv, newenviron);
perror("failed");
}
Caveat: Error handling is left out for clarities sake. Includes for sake of brevity.
Note: By combining step main() and memexec() into a single function and using splice(2) for copying directly between stdin and fd_wr the program could be significantly optimized.
2) Execution directly from memory
One does not simply load and execute an ELF binary from memory. Some preparation, mostly related to dynamic linking, has to happen. There is a lot of material explaining the various steps of the ELF linking process and studying it makes me believe that theoretically possible. See for example this closely related question on SO however there seems not to exist a working solution.
Update UserModeExec seems to come very close.
Writing a working implementation would be very time consuming, and surely raise some interesting questions in its own right. I like to believe this is by design: for most applications it is strongly undesirable to (accidentially) execute its input data because it allows code injection.
What happens exactly when an ELF is executed? Normally the kernel receives a file name and then creates a process, loads and maps the different sections of the executable into memory, performs a lot of sanity checks and marks it as executable before passing control and a file name back to the run-time linker ld-linux.so (part of libc). The takes care of relocating functions, handling additional libraries, setting up global objects and jumping to the executables entry point. AIU this heavy lifting is done by dl_main() (implemented in libc/elf/rtld.c).
Even fexecve is implemented using a file in /proc and it is this need for a file name that leads us to reimplement parts of this linking process.
Libraries
UserModeExec
libelf -- read, modify, create ELF files
eresi -- play with elfes
OSKit (seems like a dead project though)
Reading
http://www.linuxjournal.com/article/1060?page=0,0 -- introduction
http://wiki.osdev.org/ELF -- good overview
http://s.eresi-project.org/inc/articles/elf-rtld.txt -- more detailed Linux-specific explanation
http://www.codeproject.com/Articles/33340/Code-Injection-into-Running-Linux-Application -- how to get to hello world
http://www.acsu.buffalo.edu/~charngda/elf.html -- nice reference of ELF structure
Loaders and Linkers by John Levine -- deeoer explanation of linking
Related Questions at SO
Linux user-space ELF loader
ELF Dynamic loader symbol lookup ordering
load-time ELF relocation
How do global variables get initialized by the elf loader
So it seems possible, you decide whether is also practical.
Yes, though doing it properly requires designing significant parts of the compiler with this in mind. The LLVM guys have done this, first with a kinda-separate JIT, and later with the MC subproject. I don't think there's a ready-made tool doing it. But in principle, it's just a matter of linking to clang and llvm, passing the source to clang, and passing the IR it creates to MCJIT. Maybe a demo does this (I vaguely recall a basic C interpreter that worked like this, though I think it was based on the legacy JIT).
Edit: Found the demo I recalled. Also, there's cling, which seems to do basically what I described, but better.
Linux can create virtual file systems in RAM using tempfs. For example, I have my tmp directory set up in my file system table like so:
tmpfs /tmp tmpfs nodev,nosuid 0 0
Using this, any files I put in /tmp are stored in my RAM.
Windows doesn't seem to have any "official" way of doing this, but has many third-party options.
Without this "RAM disk" concept, you would likely have to heavily modify a compiler and linker to operate completely in memory.
If you are not specifically tied to C++, you may also consider other JIT based solutions:
in Common Lisp SBCL is able to generate machine code on the fly
you could use TinyCC and its libtcc.a which emits quickly poor (i.e. unoptimized) machine code from C code in memory.
consider also any JITing library, e.g. libjit, GNU Lightning, LLVM, GCCJIT, asmjit
of course emitting C++ code on some tmpfs and compiling it...
But if you want good machine code, you'll need it to be optimized, and that is not fast (so the time to write to a filesystem is negligible).
If you are tied to C++ generated code, you need a good C++ optimizing compiler (e.g. g++ or clang++); they take significant time to compile C++ code to optimized binary, so you should generate to some file foo.cc (perhaps in a RAM file system like some tmpfs, but that would give a minor gain, since most of the time is spent inside g++ or clang++ optimization passes, not reading from disk), then compile that foo.cc to foo.so (using perhaps make, or at least forking g++ -Wall -shared -O2 foo.cc -o foo.so, perhaps with additional libraries). At last have your main program dlopen that generated foo.so. FWIW, MELT was doing exactly that, and on Linux workstation the manydl.c program shows that a process can generate then dlopen(3) many hundred thousands of temporary plugins, each one being obtained by generating a temporary C file and compiling it. For C++ read the C++ dlopen mini HOWTO.
Alternatively, generate a self-contained source program foobar.cc, compile it to an executable foobarbin e.g. with g++ -O2 foobar.cc -o foobarbin and execute with execve that foobarbin executable binary
When generating C++ code, you may want to avoid generating tiny C++ source files (e.g. a dozen lines only; if possible, generate C++ files of a few hundred lines at least; unless lots of template expansion happens thru extensive use of existing C++ containers, where generating a small C++ function combining them makes sense). For instance, try if possible to put several generated C++ functions in the same generated C++ file (but avoid having very big generated C++ functions, e.g. 10KLOC in a single function; they take a lot of time to be compiled by GCC). You could consider, if relevant, to have only one single #include in that generated C++ file, and pre-compile that commonly included header.
Jacques Pitrat's book Artificial Beings, the conscience of a conscious machine (ISBN 9781848211018) explains in details why generating code at runtime is useful (in symbolic artificial intelligence systems like his CAIA system). The RefPerSys project is trying to follow that idea and generate some C++ code (and hopefully, more and more of it) at runtime. Partial evaluation is a relevant concept.
Your software is likely to spend more CPU time in generating C++ code than GCC in compiling it.
tcc compiler "-run" option allows for exactly this, compile into memory, run there and finally discard the compiled stuff. No filesystem space needed. "tcc -run" can be used in shebang to allow for C script, from tcc man page:
#!/usr/local/bin/tcc -run
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
C scripts allow for mixed bash/C scripts, with "tcc -run" not needing any temporary space:
#!/bin/bash
echo "foo"
sed -n "/^\/\*\*$/,\$p" $0 | tcc -run -
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
Execution output:
$ ./shtcc2
foo
bar
$
C scripts with gcc are possible as well, but need temporary space like others mentioned to store executable. This script produces same output as the previous one:
#!/bin/bash
exc=/tmp/`basename $0`
if [ $0 -nt $exc ]; then sed -n "/^\/\*\*$/,\$p" $0 | gcc -x c - -o $exc; fi
echo "foo"
$exc
exit
/**
*/
#include <stdio.h>
int main()
{
printf("bar\n");
return 0;
}
C scripts with suffix ".c" are nice, headtail.c was my first ".c" file that needed to be executable:
$ echo -e "1\n2\n3\n4\n5\n6\n7" | ./headtail.c
1
2
3
6
7
$
I like C scripts, because you just have one file, you can easily move around, and changes in bash or C part require no further action, they just work on next execution.
P.S:
The above shown "tcc -run" C script has a problem, C script stdin is not available for executed C code. Reason was that I passed extracted C code via pipe to "tcc -run". New gist run_from_memory_stdin.c does it correctly:
...
echo "foo"
tcc -run <(sed -n "/^\/\*\*$/,\$p" $0) 42
...
"foo" is printed by bash part, "bar 42" from C part (42 is passed argv[⁠1]), and piped script input gets printed from C code then:
$ route -n | ./run_from_memory_stdin.c
foo
bar 42
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.29.58.98 0.0.0.0 UG 306 0 0 wlan1
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0
169.254.0.0 0.0.0.0 255.255.0.0 U 303 0 0 wlan0
172.29.58.96 0.0.0.0 255.255.255.252 U 306 0 0 wlan1
$
One can easily modify the compiler itself. It sounds hard first but thinking about it, it seams obvious. So modifying the compiler sources directly expose a library and make it a shared library should not take that much of afford (depending on the actual implementation).
Just replace every file access with a solution of a memory mapped file.
It is something I am about to do with compiling something transparently in the background to op codes and execute those from within Java.
-
But thinking about your original question it seams you want to speed up compilation and your edit and run cycle. First of all get a SSD-Disk you get almost memory speed (use a PCI version) and lets say its C we are talking about. C does this linking step resulting in very complex operations that are likely to take more time than reading and writing from / to disk. So just put everything on SSD and live with the lag.
Finally the answer to OP question is yes!
I found memrun repo from guitmz, that demoed running (x86_64) ELF from memory, with golang and assembler. I forked that, and provided C version of memrun, that runs ELF binaries (verified on x86_64 and armv7l), either from standard input, or via first argument process substitution. The repo contains demos and documentation (memrun.c is 47 lines of code only):
https://github.com/Hermann-SW/memrun/tree/master/C#memrun
Here is simplest example, with "-o /dev/fd/1" gcc compiled ELF gets sent to stdout, and piped to memrun, which executes it:
pi#raspberrypi400:~/memrun/C $ gcc info.c -o /dev/fd/1 | ./memrun
My process ID : 20043
argv[0] : ./memrun
no argv[1]
evecve --> /usr/bin/ls -l /proc/20043/fd
total 0
lr-x------ 1 pi pi 64 Sep 18 22:27 0 -> 'pipe:[1601148]'
lrwx------ 1 pi pi 64 Sep 18 22:27 1 -> /dev/pts/4
lrwx------ 1 pi pi 64 Sep 18 22:27 2 -> /dev/pts/4
lr-x------ 1 pi pi 64 Sep 18 22:27 3 -> /proc/20043/fd
pi#raspberrypi400:~/memrun/C $
The reason I was interested in this topic was usage in "C script"s. run_from_memory_stdin.c demonstrates all together:
pi#raspberrypi400:~/memrun/C $ wc memrun.c | ./run_from_memory_stdin.c
foo
bar 42
47 141 1005 memrun.c
pi#raspberrypi400:~/memrun/C $
The C script producing shown output is so small ...
#!/bin/bash
echo "foo"
./memrun <(gcc -o /dev/fd/1 -x c <(sed -n "/^\/\*\*$/,\$p" $0)) 42
exit
/**
*/
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("bar %s\n", argc>1 ? argv[1] : "(undef)");
for(int c=getchar(); EOF!=c; c=getchar()) { putchar(c); }
return 0;
}
P.S:
I added tcc's "-run" option to gcc and g++, for details see:
https://github.com/Hermann-SW/memrun/tree/master/C#adding-tcc--run-option-to-gcc-and-g
Just nice, and nothing gets stored in filesystem:
pi#raspberrypi400:~/memrun/C $ uname -a | g++ -O3 -Wall -run demo.cpp 42
bar 42
Linux raspberrypi400 5.10.60-v7l+ #1449 SMP Wed Aug 25 15:00:44 BST 2021 armv7l GNU/Linux
pi#raspberrypi400:~/memrun/C $

Getting link errors with CMake

I'm getting multiple definition link errors after conditionally compiling platform-specific code.
My project is laid out like this:
/
|__+ include/
| |__+ native/
| | |__ impl.h
| |
| |__ general.h
|
|__+ src/
|__+ native/
| |__ impl.linux.c
| |__ impl.win32.c
|
|__ general.c
At the top of the general.c file:
#if defined(LIBRARY_PLATFORM_LINUX)
#include "native/impl.linux.c"
#elsif defined(LIBRARY_PLATFORM_WIN32)
#include "native/impl.win32.c"
#endif
I set up introspection in CMake in order to detect the operating system and define the corresponding constants. The thing is, I didn't want to maintain one CMakeLists.txt file in every directory, so I simply globbed all the .c files as suggested in this answer:
file(GLOB_RECURSE LIBRARY_SOURCE_FILES "${PROJECT_SOURCE_DIR}/src/*.c")
Apparently, this is what is causing the problem. It seems to be compiling the code #included in general.c as well as the individual src/native/impl.*.c files.
CMakeFiles/lib.dir/src/native/impl.linux.c.o: In function `declared_in_impl_h':
impl.linux.c:(.text+0x0): multiple definition of `declared_in_impl_h'
CMakeFiles/lib.dir/src/general.c.o:general.c:(.text+0x0): first defined here
How can I untangle this situation?
The best practice for that sort of cross-platform situation is to create two libraries, one for linux and one for windows and stop doing conditional includes. Each platform only compiles and links the relevant library.
The recommended way to do that with cmake is to stop globbing and just include each file. There are some situations where it can get confused and not realize that it needs to recompile. You can make an argument that non-changing legacy code won't have that problem.
If you really want to avoid doing either of these things, I would put the included code in a header instead of a c file. You don't really want the include guards so that people don't get it confused for something that should be used like a regular header. Put a bunch of comments in the file to warn them off of said behavior as well.

Is there a C project Default Directory Layout?

I've always wanted to know if there is a default directory layout for C projects. You know, which folders should i put which files and such.
So I've downloaded lots of project's source codes on SourceForge and they were all different than each other.
Generally, I found more or less this structure:
/project (root project folder, has project name)
|
|____/bin (the final executable file)
|
|
|____/doc (project documentation)
| |
| |____/html (documentation on html)
| |
| |____/latex (documentation on latex)
|
|
|____/src (every source file, .c and .c)
| |
| |____/test (unit testing files)
|
|
|____/obj (where the generated .o files will be)
|
|
|____/lib (any library dependences)
|
|
|____BUGS (known bugs)
|
|____ChangeLog (list of changes and such)
|
|____COPYING (project license and warranty info)
|
|____Doxyfile (Doxygen instructions file)
|
|____INSTALL (install instructions)
| |
|____Makefile (make instructions file)
|
|____README (general readme of the project)
|
|____TODO (todo list)
Is there a default standard somewhere?
Edit: Sorry, really. I realised there are numerous similar questions for recommended C project directory files. But I've seen people say what they think is best. I'm looking for a standard, something that people usually follow.
Related Questions:
C - Starting a big project. File/Directory structure and names. Good example required
Folder structure for a C project
File and Folder structure of a App/Project based in C
Project Organization in C Best Practices
I would say "no", and your empirical evidence seems to support that.
I usually get confused right around when I need to decide between doc/ and docs/ ...
Well, there is “libabc” which is showcasing common practice.

How to store a version number in a static library?

How can I store a version number in a static library (file.a) and later check for its version in Linux?
P.S. I need possibility to check version of file any time without any special executable using only by shell utilities.
In addition to providing a static string as mentioned by Puppe, it is common practice to provide a macro to retrieve the version check for compatibility. For example, you could have the following macros (declared in a header file to be used with your library):
#define MYLIB_MAJOR_VERSION 1
#define MYLIB_MINOR_VERSION 2
#define MYLIB_REVISION 3
#define MYLIB_VERSION "1.2.3"
#define MYLIB_VERSION_CHECK(maj, min) ((maj==MYLIB_MAJOR_VERSION) && (min<=MYLIB_MINOR_VERSION))
Notice with the MYLIB_CHECK_VERSION macro, I'm assuming you want a specific major rev and a minor rev greater than or equal to your desired version. Change as required for your application.
Then use it from a calling application, something like:
if (! MYLIB_VERSION_CHECK(1, 2)) {
fprintf(stderr, "ERROR: incompatible library version\n");
exit(-1);
}
This approach will cause the version information to come from the included header file. Additionally, it will be optimized at compile time for the calling application. With a little more work, you can extract it from the library itself. Read on...
You can also use this information to create a static string stored inside your library, as mentioned by Puppe. Place something like this inside your library:
struct {
const char* string;
const unsigned major;
const unsigned minor;
const unsigned revision;
} mylib_version = {
MYLIB_VERSION, MYLIB_MAJOR_VERSION, MYLIB_MINOR_VERSION, MYLIB_REVISION
};
This will create a struct called mylib_version in your library. You can use this to do further verifications by creating functions inside your library and accessing those from a calling application, etc.
Maybe you could create a string with the version like this:
char* library_version = { "Version: 1.3.6" };
and to be able to check it from the shell just use:
strings library.a | grep Version | cut -d " " -f 2
Creating a new answer based on your edit... Just to avoid confusion :)
If you are looking for a non-code way to solve the problem, you could try this. It's (yet again) an alternative to the strings approach defined by Puppe.
Maybe you could just touch a file called version_1.2.3 and add it to the archive. Then, you could determine the version by looking for the version file using the ar command:
ar t libmylib.a | grep 'version_' | sed -e 's/^version_//'
I'm not sure if that will get you what you need, but there is no standard method for embedding metadata like this in an archive. Maybe you'll find other information you want to store in this "metafile" for the archive.
If you are using gcc, you can use the #ident directive
#ident "Foo Version 1.2.3.4"
void foo(void){ /* foo code here */ }
To get the version just use one of the following:
strings -a foo.o | grep "Foo Version"
strings -a foo.a | grep "Foo Version"
strings -a foo.so | grep "Foo Version"
This will allow you to compile the version into the library with the capability of later stripping it out using strip -R .comment your_file or completely omit it by passing -fno-ident (This will also omit the compiler version comments from the compiled objects)
Several times man 1 ident has been mentioned, so here are details about using that method.
ident is a command that comes with the RCS (Revision Control System), but might also be available if you are using CVS (Concurrent Versions System), or Subversion.
You would use it like this (cloned from the man page):
#include <stdio.h>
static char const rcsid[] =
"$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $";
int main() { return printf("%s\n", rcsid) == EOF; }
and f.c is compiled into f.o, then the command
ident f.c f.o
will output
f.c:
$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $
f.o:
$Id: f.c,v 5.4 1993/11/09 17:40:15 eggert Exp $
If your f.o were added to a static library f.a then ident f.a should show a similar output. If you have several similarly built [a-z].o in your az.a you should find all their strings in the az.a file.
CAVEAT: Just because they are in the .a file doesn't mean they will be included in your program file. Unless the program references them the linker sees no need to include them. So you usually have to have a method in each module to return the string, and the app needs to call that method. There are ways to convince most linkers that it is a required symbol without actually referencing it, but it depends on the linker, and is beyond the scope of this answer.
If instead you are familiar with the SCCS (Source Code Control System) then you would use man 1 what instead, and it would look like this (done with macros to show the flexibility available):
#include <stdio.h>
#define VERSION_STR "5.4"
#define CONFIG "EXP"
#define AUTHOR "eggert"
static char const sccsid[] =
"#(#) " CONFIG " v " VERSION_STR " " __DATE__ " " __TIME__ " " AUTHOR;
int main() { return printf("%s\n", sccsid) == EOF; }
and f.c is compiled into f.o, then the command
what f.c f.o
will output
f.c:
#(#) EXP v 5.4 1993/11/09 17:40:15 eggert
f.o:
#(#) EXP v 5.4 1993/11/09 17:40:15 eggert
PS: both ident and what are commands that come with specific centralized source control systems. If you are using a distributed source control system (like git) the entire concept may not make sense. For some ideas using git see this thread: Moving from CVS to git: $Id:$ equivalent? though the hash isn't the same as a version number. :)

Tools to get a pictorial function call graph of code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a large work space which has many source files of C code. Although I can see the functions called from a function in MS VS2005 using the Object browser, and in MSVC 6.0 also, this only shows functions called from a particular function in a non-graphical kind of display. Additionally, it does not show the function called starting from say main(), and then the functions called from it, and so on, deeper inside to the leaf level function.
I need a tool which will give me a function call graph pictorially with functions callee and caller connected by arrows or something like that, starting from main() to the last level of function, or at least showing a call graph of all functions in one C source file pictorially. It would be great if I could print this graph.
Any good tools to do that (need not be free tools)?
Egypt (free software)
ncc
KcacheGrind (GPL)
Graphviz (CPL)
CodeViz (GPL)
Dynamic analysis methods
Here I describe a few dynamic analysis methods.
Dynamic methods actually run the program to determine the call graph.
The opposite of dynamic methods are static methods, which try to determine it from the source alone without running the program.
Advantages of dynamic methods:
catches function pointers and virtual C++ calls. These are present in large numbers in any non-trivial software.
Disadvantages of dynamic methods:
you have to run the program, which might be slow, or require a setup that you don't have, e.g. cross-compilation
only functions that were actually called will show. E.g., some functions could be called or not depending on the command line arguments.
KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
f0, f1 and f2 are called as expected from one another
pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind manages to do that by running your code through a lightweight "virtual machine". This also makes execution extremely slow compared to native execution.
As can be seen on the graph, timing information about each function call is also obtained, and this can be used to profile the program, which is likely the original use case of this setup, not just to see call graphs: How can I profile C++ code running on Linux?
Tested on Ubuntu 18.04.
gcc -finstrument-functions + etrace
https://github.com/elcritch/etrace
-finstrument-functions adds callbacks, etrace parses the ELF file and implements all callbacks.
I couldn't get it working however unfortunately: Why doesn't `-finstrument-functions` work for me?
Claimed output is of format:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
Likely the most efficient method besides specific hardware tracing support, but has the downside that you have to recompile the code.
Understand does a very good job of creating call graphs.
Our DMS Software Reengineering Toolkit has static control/dataflow/points-to/call graph analysis that has been applied to huge systems (~~25 million lines) of C code, and produced such call graphs, including functions called via function pointers.
You may try CScope + tceetree + Graphviz.
You can check out my bash-based C call tree generator here. It lets you specify one or more C functions for which you want caller and/or called information, or you can specify a set of functions and determine the reachability graph of function calls that connects them... I.e. tell me all the ways main(), foo(), and bar() are connected. It uses graphviz/dot for a graphing engine.
Astrée is the most robust and sophisticated tool out there, IMHO.

Resources