GDB: dump arguments to all calls of a specific function - c

I need to profile the values passed as arguments to the standard C library function sqrt() in my program.
The trivial way is to insert code to dump these values to a file before the actual call to sqrt() (e.g. a simple fprintf()). However, if sqrt() is called from inside a library, or if it is called from multiple locations, the task can become hard.
Is there a way to automatically do this in GDB or in some other debugging tool?
Thanks in advance for your help.
Best Regards.

Sure, it can be done. There is an easy way and a hard way.
The easy way is if you have debuginfo for sqrt. Most distros make this available; e.g., for Fedora you can use debuginfo-install to install it.
In this case, find the function in question, set a breakpoint on it, and have the breakpoint commands print the arguments:
break sqrt
commands
silent
info args
cont
end
If you have a new enough gdb, and you know the names of the arguments, you can use the dprintf command instead. This will give you nicer formatting and not interact badly with other debugging commands like next.
The hard way is if you don't have debug info. In this case you need to know the platform ABI. Then you can still set the breakpoint, and then print the appropriate registers or dump the appropriate memory, depending on how the arguments are passed.
Yet another way is to use SystemTap. This is a pretty good tool for this kind of tracing.

Related

how to catch calls with LD_PRELOAD when unknown programs may be calling execve without passing environment

I know how to intercept system calls with LD_PRELOAD, that occur in compiled programs I may not have source for. For example, if I want to know about the calls to int fsync(int) of some unknown program foobar, I compile a wrapper
int fsync(int)
for
(int (*) (int))dlsym(RTLD_NEXT,"fsync");
into a shared library and then I can set the environment variable LD_PRELOAD to that and run foobar. Assuming that foobar is dynamically linked, which most programs are, I will know about the calls to fsync.
But now suppose there is another unknown program foobar1 and in the source of that program was a statement like this:
execve("foobar", NULL, NULL)
that is, the environment was not passed. Now the whole LD_PRELOAD scheme breaks down?
I checked by compiling the statemet above into foobar1, when that is run, the calls from foobar are not reported.
While one can safely assume most modern programs are dynamically linked, one cannot at all assume how they may or may not be using execve?
So then, the whole LD_PRELOAD scheme, which everybody says is such a great thing, is not really working unless you have the source to the programs concerned, in which case you can check the calls to execve and edit them if necessary. But in that case, there is no need for LD_PRELOAD, if you have sources to everything. LD_PRELOAD is specifically, supposed to be, useful when you don't have sources to the programs you are inspecting.
Where am I wrong here - how can people say, that LD_PRELOAD is useful for inspecting what unknown programs are doing??
I guess I could also write a wrapper for execve. In the wrapper, I add to the original envp argument, one more string: "LD_PRELOAD=my library" . This "seems" to work, I checked on simple examples.
I am not sure if I should be posting an "answer" which may very easily exceed my level of C experience.
Can somebody more experienced than me comment if this is really going to work in the long run?

Instrumentation Tools for C?

Suggestions needed for best instrumentation tools for a C project. I actually like to know when control was transferred from a function to another function and therefore I want to do something like instrumenting printf commands at the end and start of each function.
The valgrind tool has all sorts of hooks you can program that would let you watch this happen. In particular, the callgrind tool might be appropriate here.
If you use GCC as a compiler, it has a -finstrument-functions option that automatically generates calls on entering and leaving functions.
This allows customization of what you try to instrument.

Is there an easy way to find which other functions can call a certain function from the source code?

I have a function which is called explicitly by 4 other functions in my code base. Then in turn each of these functions is called by at least 10 other functions throughout my code. I know that I could, by hand, trace one of these function calls to the main function of my program (which has 30 function calls) but it seems like this would be a better job for the computer. I just want to know which of the functions in main() is calling this buried function.
Does anyone know of any software that could help?
Also, using a debugger is out of the question. That would have been too easy. The software only runs on a hand held device.
doxygen, correctly configured, is able to output an HTML document with navigable caller list and called-by list for every function in your code. You can generate call graphs as well.
Comment it out (or better, comment out its prototype) and try to compile your program. You should see, where it is referenced.
If your platform has an API to capture backtraces, I would just instrument up the function to use those and log them to a file for later analysis. There's no guarantee that this will find all callers (or callers-of-...-of-callers), but if you exercise all of the programs features while logging like this, you should find "most" of them. For relatively simple programs, it is possible to find all callers this way.
Alternatively, many sampling tools can get you this information.
However, I have a suspicion that you may be on a platform that doesn't have a lot of these features, so a static source-analysis tool (like mouviciel suggested) is likely your best option. Assuming that you can make it work for you, this has the added benefit that it should find all callers, not just most of them.
http://cscope.sourceforge.net/ I think this also can be useful.
I second mouviciel's suggestion of using doxygen for getting this info. The downside is that doxygen is working on the source code. You can only see what functions CAN POTENTIALLY call your function, not the ones that are ACTUALLY CALLING your function. If you are using Linux and you can change the source code of the function in question, you can obtain this info using the backtrace() and the backtrace_symbols() functions.

Help with using LD_PRELOAD

I want to create a library with a modified version of printf and then call LD_PRELOAD so when my program calls printf it uses my version. Can someone explain to me how to use LD_PRELOAD and if there is a something special I need to do in my code or my library?
You just set the environment variable LD_PRELOAD to the full path to the replacement library. Since all programs you launch after that point will attempt to use this library, you may want to make a wrapper script that sets LD_PRELOAD then calls the program you want to run.
As far as I know first of all the program cannot have changed evective uid or gid (so called suid or guid programs).
It should be used only for specific purposes such as debugging. As far as I recall you may shadow functions in C (in elf?). However both techniques - LD_PRELOAD and shadowing should be deal with extream care. I remember discovering bug in shadowing g_malloc in gpgme code (or other related to gpg) as the GLib internals changed.
The simple answer is - don't do it. The more complicated - do it if and only if you have to - and usually you don't (unless you write some sort of debugging software).
That seems like a bad idea. Why not name your version of printf something else?

Optimized code on Unix?

What is the best and easiest method to debug optimized code on Unix which is written in C?
Sometimes we also don't have the code for building an unoptimized library.
This is a very good question. I had similar difficulties in the past where I had to integrate 3rd party tools inside my application. From my experience, you need to have at least meaningful callstacks in the associated symbol files. This is merely a list of addresses and associated function names. These are usually stripped away and from the binary alone you won't get them... If you have these symbol files you can load them while starting gdb or afterward by adding them. If not, you are stuck at the assembly level...
One weird behavior: even if you have the source code, it'll jump forth and back at places where you would not expect (statements may be re-ordered for better performance) or variables don't exist anymore (optimized away!), setting breakpoints in inlined functions is pointless (they are not there but part of the place where they are inlined). So even with source code, watch out these pitfalls.
I forgot to mention, the symbol files usually have the extension .gdb, but it can be different...
This question is not unlike "what is the best way to fix a passenger car?"
The best way to debug optimized code on UNIX depends on exactly which UNIX you have, what tools you have available, and what kind of problem you are trying to debug.
Debugging a crash in malloc is very different from debugging an unresolved symbol at runtime.
For general debugging techniques, I recommend this book.
Several things will make it easier to debug at the "assembly level":
You should know the calling
convention for your platform, so you
can tell what values are being passed
in and returned, where to find the
this pointer, which registers are "caller saved" and which are "callee saved", etc.
You should know your OS "calling convention" -- what a system call looks like, which register a syscall number goes into, the first parameter, etc.
You should
"master" the debugger: know how to
find threads, how to stop individual
threads, how to set a conditional
breakpoint on individual instruction, single-step, step into or skip over function calls,
etc.
It often helps to debug a working program and a broken program "in parallel". If version 1.1 works and version 1.2 doesn't, where do they diverge with respect to a particular API? Start both programs under debugger, set breakpoints on the same set of functions, run both programs and observe differences in which breakpoints are hit, and what parameters are passed.
Write small code samples by the same interfaces (something in its header), and call your samples instead of that optimized code, say simulation, to narrow down the code scope which you debug. Furthermore you are able to do error enjection in your samples.

Resources