C Linux Tracing all function calls including function inside library

C Linux Tracing all function calls including function inside library - c

i have program like
int main()
{
char *ptr = malloc(2);
free(ptr);
}
So i just want to trace of all function calls happening inside the program ,till system call
like
malloc
|____ libc( sme_fn)
|
|__sme_system_call
Could you please tell some way to get this ?

As you know, "system calls" come in two flavors:
Calls directly to the operating system ("open", "close", "fork", "exec", "exit", etc)
Standard C runtime functions for the platform ("printf()", "malloc()", "free(), etc.)
You can view the former with "strace".
You can view (at least calls into) the latter with gdb.
You can look at the complete implementation, and all internals, directly from the source code:
Gnu C Library
Linux kernel
Finally, if you're having issues with "malloc()", "valgrind" is (one of several) very, very useful tools to consider.

If you're using gcc, compile with -pg and then use the gprof command.
Or, if you're on Linux, you can use oprofile to do something similar without recompiling.
Both tools should give you call graphs, which is what you're looking for.

Related

Getting printf in assembly with only system calls?

I am looking to understand the printf() statement at the assembly level. However most of the assembly programs do something like call an external print function whose dependency is met by some other object file that the linker adds on. I would like to know what is inside that print function in terms of system calls and very basic assembly code. I want a piece of assembly code where the only external calls are the system calls, for printf. I'm thinking of something like a de assembled object file. Where can I get something like that??

I would suggest instead to stay first at the C level, and study the source code of some existing C standard library free software implementation on Linux. Look into the source code of musl-libc or of GNU libc (a.k.a. glibc). You'll understand that several intermediate (usually internal) functions are useful between printf and the basic system calls (listed in syscalls(2) ...). Use also strace(1) on a sample C program doing printf (e.g. the usual hello-world example).
In particular, musl-libc has a very readable stdio/printf.c implementation, but you'll need to follow several other C functions there before reaching the write(2) syscall. Notice that some buffering is involved. See also setvbuf(3) & fflush(3). Several answers (e.g. this and that one) explain the chain between functions like printf and system calls (up to kernel code).
I want a piece of assembly code where the only external calls are the system calls, for printf
If you want exactly that, you might start from musl-libc's stdio/printf.c, add any additional source file from musl-libc till you have no more external undefined symbols, and compile all of them with gcc -flto -O2 and perhaps also -S, you probably will finish with a significant part of musl-libc in object (or assembly) form (because printf may call malloc and many other functions!)... I'm not sure it is worth the pain.
You could also statically link your libc (e.g. libc.a). Then the linker will link only the static library members needed by printf (and any other function you are calling).
To be picky, system calls are not actually external calls (your libc write function is actually a tiny wrapper around the raw system call). You could make them using SYSENTER machine instructions (but using vdso(7) is preferable: more portable, and perhaps quicker), and you don't even need a valid stack pointer (on x86_64) to make a system call.
You can write Linux user-level programs without even using the libc; the bones implementation of Scheme is such a program (and you'll find others).

The function printf() is in the standard C library, so it is linked into your program and not copied into it. Dynamically linked libraries save memory because you don't have the exact same code copied in resident memory for every program that uses it.
Think about what printf() does. Interpreting the formatted string and generating the correct output is fairly complex. The series of functions that printf() belongs to also buffers the output. You probably don't really want to re-implement all of this in assembly. The standard C library is omnipresent, and probably available for you.
Maybe you're looking for write(2), which is the system call for unbuffered writes of just bytes to a file descriptor. You'd have to generate the string to print beforehand and format it yourself. (See also open(2) for opening files.)
To disassemble a binary, you can use objdump:
objdump -d binary
where binary is some compiled binary. This gives opcodes and human readable instructions. You probably want to redirect to a file and read elsewhere.
You can disassemble the standard C binary on your system and try to interpret it if you want (strongly not recommended). The problem is that it will be far too complex to understand. Things like printf() were written in C, then compiled and assembled. You can't (within a reasonable number of decades) restore the high level structure from the assembly of a compiled (non-trivial) program. If you really want to try this, good luck.
An easier thing to do is to look at the C source code for printf() itself. The real work is actually done in vfprintf() which is in stdio-common/vfprintf.c of the GNU C library source code.

how to catch calls with LD_PRELOAD when unknown programs may be calling execve without passing environment

I know how to intercept system calls with LD_PRELOAD, that occur in compiled programs I may not have source for. For example, if I want to know about the calls to int fsync(int) of some unknown program foobar, I compile a wrapper
int fsync(int)
for
(int (*) (int))dlsym(RTLD_NEXT,"fsync");
into a shared library and then I can set the environment variable LD_PRELOAD to that and run foobar. Assuming that foobar is dynamically linked, which most programs are, I will know about the calls to fsync.
But now suppose there is another unknown program foobar1 and in the source of that program was a statement like this:
execve("foobar", NULL, NULL)
that is, the environment was not passed. Now the whole LD_PRELOAD scheme breaks down?
I checked by compiling the statemet above into foobar1, when that is run, the calls from foobar are not reported.
While one can safely assume most modern programs are dynamically linked, one cannot at all assume how they may or may not be using execve?
So then, the whole LD_PRELOAD scheme, which everybody says is such a great thing, is not really working unless you have the source to the programs concerned, in which case you can check the calls to execve and edit them if necessary. But in that case, there is no need for LD_PRELOAD, if you have sources to everything. LD_PRELOAD is specifically, supposed to be, useful when you don't have sources to the programs you are inspecting.
Where am I wrong here - how can people say, that LD_PRELOAD is useful for inspecting what unknown programs are doing??

I guess I could also write a wrapper for execve. In the wrapper, I add to the original envp argument, one more string: "LD_PRELOAD=my library" . This "seems" to work, I checked on simple examples.
I am not sure if I should be posting an "answer" which may very easily exceed my level of C experience.
Can somebody more experienced than me comment if this is really going to work in the long run?

Taking input without using libc

How would I take input or print output without using libc? I can use system calls, but didn't know if that would help.

There is no platform-independent way to do this. In fact, the whole point of having libc is to have a common interface to a set of functionality that most systems provide, but do so in fundamentally different ways.
Your best option would probably be to consult the documentation for whatever system you are currently using. You could look up your OS's set of interrupts and then try using the asm keyword to write assembly instructions that tell the OS to read input or display output. You could look into libraries provided by the OS for doing input and output on file descriptors, then use those functions instead. Or, you could look at process creation libraries, then spawn off a process to read or write data from the console, where the second program uses libc. None of these are guaranteed to be at all portable, though.
Hope this helps!

I stepped through write(2) with gdb to get an idea of how the system call ABI works.
Anyway, no libc at all. Note that without special tricks, the cc(1) compiler/linker front-end will still link you with libc, but you won't be using it for anything. The C runtime start-up code will make some libc calls, but this program won't.
void mywrite(int fd, const void *b, int c) {
asm("movl $1, %eax");
asm("syscall");
}
int main(void) { const char *s = "Hello world.\n"; return mywrite(1, s, 13), 0; }

What is the need for C startup routine?

Quoting from one of the unix programming books,
When a C program is executed by the
kernelby, one of the exec functions
calls special start-up routine. This
function is called before the main
function is called. The executable
program file specifies this routine as
the starting address for the program;
this is set up by the link editor when
it is invoked by the C compiler. This
start-up routine takes values from the
kernel the command-line arguments and
the environment and sets things up so
that the main function is called as
shown earlier.
Why do we a need a middle man start-up routine. The exec function could have straightway called the main function and the kernel could have directly passed the command line arguments and environment to the main function. Why do we need the start-up routine in between?

Because C has no concept of "plug in". So if you want to use, say, malloc() someone has to initialize the necessary data structures. The C programmers were lazy and didn't want to have to write code like this all the time:
main() {
initialize_malloc();
initialize_stdio();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
... oh wow, can we start already? ...
}
So the C compiler figures out what needs to be done, generates the necessary code and sets up everything so you can start with your code right away.

The start-up routine initializes the CRT (i.e. creates the CRT heap so that malloc/free work, initializes standard I/O streams, etc.); in case of C++ it also calls the globals' constructors. There may be other system-specific setup, you should check the sources of your run-time library for more details.

Calling main() is a C thing, while calling _start() is a kernel thing, indicated by the entry point in the binary format header. (for clarity: the kernel doesn't want or need to know that we call it _start)
If you would have a non-C binary, you might not have a main() function, you might not even have the concept of a "function" at all.
So the actual question would be: why doesn't a compiler give the address of main() as a starting point? That's because typical libc implementations want to do some initializations before really starting the program, see the other answers for that.
edit as an example, you can change the entry point like this:
$ cat entrypoint.c
int blabla() { printf("Yes it works!\n"); exit(0); }
int main() { printf("not called\n"); }
$ gcc entrypoint.c -e blabla
$ ./a.out
Yes it works!

Important to know also is that an application program is executed in user mode, and any system calls out, set the privileged bit and go into kernel mode. This helps increase OS security by preventing the user from accessing kernel level system calls and a myriad of other complications. So a call to printf will trap, set kernel mode bit, execute code, then reset to user mode and return to your application.
The CRT is required to help you and allow you to use the languages you want in Windows and Linux. it provides some very fundamental bootstrapping into the OS to provide you with feature sets for development.

Is busybox available in shared library form?

Is busybox available in shared library form? I would like to use individual apps programmatically instead of using system(). I've heard about libbusybox and libbb but could not find any documentation.

There exists busybox library in a shared form called libbusybox(.so), you just have to enable it while making menuconfig. When you have compiled, it will be avalible in 0_lib folder. In this library you have nice little function called int lbb_main(char **argv).
What you need to do in your code is something like this:
extern int lbb_main(char **argv);
int main()
{
char* strarray[] = {"ifconfig",0};
lbb_main(strarray);
return 1;
}
You could import libb.h, but that didn't work for me, because I got many errors.
After that you just have to compile using somethin like gcc -o code code.c -Lpath_to_0_lib_fodler -lbusyboxand that's it!
To intercept output you will have to redefine printf and similar calls, buts that's clearly doable by using soemthing macros like #define printf(...) code' inlibb.h'.
You could even spawn busybox's shell that doesn't use fork or system, but that doesn't work well yet.

If you are on a tiny embedded system where it matters, you can link your own app into the busybox binary, then you can call its functions without any dynamic linker at all.
If you are not, just use system(), or some fork/exec combo.
It is unlikely you'll want to call the utilities so often that performance matters.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight