Tracing code from user to hardware - c

I'm curious if someone can point me in the right direction here. I'm learning about computer systems programming (the basics) and I'm trying to trace code through different levels to see how each interacts with the other. An example would be calling the fgets() function in C or getline() in C++ or similar. Both of those would make calls to the system right? Is there an easy way to look at the code that is called?
I'm working on Unix (Ubuntu). Is this something that is proprietary with Windows and Apple? Any good resources out there for this kind of thing? As always, thanks guys!

At least in the UNIX world, the answer is fairly easy: "Use the Source, Luke".
In your example, you would look at the sources for, say, fgetc(). That's in the C standard library, and the easiest way to find the source is google something like "C libraary fgets() source".
When you get that source, you'll see a bunch of code handling buffers etc, and a system call, probably to read(2). The "2" there tells you it is documented in Chapter 2 of the manual (eg, you can find it with man 2 read).
The system call is implemented in the kernel, so then you need to read the kernel source. Proceed from there.
Now, what you need to find this all without having to read randomly about in the sources (although that's the way a lot of people have learned it, it's not very efficient) is to get hold of a book on Linux like Kerrisk's The Linux Programming Interface, which explains some of these things at a somewhat higher level than just the source.

Something fgets is located within libc. That is, it's a userland library linked with most C binaries. Check out glibc, which is currently the most common implementation.
Eventually, libc will start making system calls to the kernel. You can get the source at kernel.org. Check out KGDB for kernel debugging. The simplest way to to do kernel debugging is to use a second machine connected via null model cable.

On Windows, you could get some insight with a few things. First you'd need something called symbol files that correspond to the binaries you'd like to investigate. Symbol files associate textual names with the global/stack/heap variables floating around a program. So to map the address in memory to the function fgets, and see fgets in certain programs you'd need to have the symobls for the version of Microsoft's implementation of the C std library. Lucky for you MS makes their symbols freely available
Second you would need to capture a callstack that dove deeper than fgets. The most obvious way to do this would be to be a Microsoft developer and introduce a crash into a deep MS dll, then analyse the crash dump with a debugger and symbols, but unfortunately we can't do that. What you can do is use whats called a sampling profiler, as in this one freely available from Microsoft. A sampling profiler profiles your code by taking periodic snapshots of the callstack of your program. Using symbol files from Microsoft, we can digest that callstack into something meaningful.
Given those 2 pieces of info, it wouldn't be hard to construct a program and get some insight into what fgets calls. You can then use the sampling profiler with Microsoft's symbols to get an idea of whats going on during your program.
Along these lines I constructed the following program to try this out:
int FgetSTest()
{
FILE* fp;
fp = fopen("C:/test.txt", "w");
char data[100];
int sum = 0;
for (int i = 0; i < 100; ++i)
{
fgets(data, 100, fp);
sum += data[0];
}
fclose(fp);
return sum;
}
int _tmain(int argc, _TCHAR* argv[])
{
int sum = 0;
for (int i = 0; i < 100; ++i)
{
sum += FgetSTest();
}
std::cout << sum;
return 0;
}
Assuming you've compiled this into a program (I've compiled it into one called perfPlay.exe) you can run MS's sampling profiler on the exe as follows:
C:\path\to\exe>vsperfcmd /start:sample /output:perfPlay.vsp
Microsoft (R) VSPerf Command Version 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.
C:\path\to\exe\>vsperfcmd /launch:perfPlay.exe
Microsoft (R) VSPerf Command Version 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.
Successfully launched process ID:3700 perfPlay.exe
sum is:40000
C:\path\to\exe>vsperfcmd /shutdown
Microsoft (R) VSPerf Command Version 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.
Shutting down the Profile Monitor
------------------------------------------------------------
Get profiler output, notice the "symbolpath" switch to point the command to Microsoft's symbol server:
C:\path\to\exe>vsperfreport perfplay.vsp /summary:all /symbolpath:srv*c:\symbols*htt
p://msdl.microsoft.com/download/symbols
You can examine the csv directly of the caller-callee report, or find a good viewer, like the one I've been working on, and you can get an idea of where fgets spends most of its time:
Sadly, not terribly insightful. Unfortunately, one of the problems you'll run into with this approach is that many of the functions fgets calls in release mode could very well be inlined -- that is they are pretty much removed as functions from the final program and their contents directly "pasted" in to where they're used.
You could try repeat the above in debug mode to see what you get, as there's less chance of inlining.

First things first; this task will require good tools. I find etags, cscope, and gid (from GNU idutils) indispensable tools when navigating source. Figure out how to integrate one or more of these into your favorite editor or IDE. Switch editor or IDE to get these features, there's no excuse for poor tools. If you're looking for advice on one, I love vim, a vast many people argue for emacs, and there's some folks who love their Eclipse.
You'll want the sources locally; lxr is an amazing tool, but the latency involved in repeated web requests gets tiring for any serious work. On Debian-derived systems, this is pretty easy; change directory to wherever you wish to store the source and run apt-get source eglibc to download the glibc sources. I suggest getting the kernel sources via a tarball from http://www.kernel.org or cloning the master git repository (a better choice if you want to read changelogs or easily get updates -- though it does expand to 2.7 gigabytes as of June 2012, so it obviously isn't for everyone).
Once you've built tags files for the C library, you can just run: vim -t fgets and it will open libio/bits/stdio2.h directly to the source for the fgets() routine. (It is much less readable than you may hope.) Follow these around until you eventually get to a read() system call. (It may take a while.)
Now switch to the kernel sources. Look in fs/read_write.c for this this:
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
One downside to the way the kernel uses macros to define system calls is that it complicates searching for functions. vim -t can't find this directly. The easiest thing to do when looking for system calls is to run gid -s SYSCALL_DEFINE | grep read. (If you find a better tool, let me know.) Once you've found the system call entry point, it'll be far easier to read the rest of the kernel source. (I generally find it more legible than glibc sources, too -- though the days of being five or six function calls away from the block-level bread() call are long gone.)

Related

Is it possible to modify a C program which is running?

i was wondering if it is possible to modify a piece of C program (or other binary) while it is running ?
I wrote this small C program :
#include <stdio.h>
#include <stdint.h>
static uint32_t gcui32_val_A = 0xAABBCCDD;
int main(int argc, char *argv[]) {
uint32_t ui32_val_B = 0;
uint32_t ui32_cpt = 0;
printf("\n\n Program SHOW\n\n");
while(1) {
if(gcui32_val_A != ui32_val_B) {
printf("Value[%d] of A : %x\n",ui32_cpt,gcui32_val_A);
ui32_val_B = gcui32_val_A;
ui32_cpt++;
}
}
return 0;
}
With a Hex editor i'm able to find "0xAABBCCDD" and modify it when the program is stopped. The modification works when I relauch the program. Cool !
I would like to do this when the program s running is it possible ?
Here is a simple example to understand the phenomena and play a little with it but my true project is bigger.
I have an old DOS game called Dangerous Dave.
I'm able to modify the tiles by simply editing the binary (thanks to http://www.shikadi.net/moddingwiki/Dangerous_Dave)
I developped a small editor that do this pretty well and had fun with it.
I launch the DOS game by using DOSBOX, it works !
I would like to do this dynamically when the game is running. Is it possible ?
PS : I work under Debian 64bit
regards
I was wondering if it is possible to modify a piece of C program (or other binary) while it is running ?
Not in standard (and portable) C11. Read the n1570 specification to check. Notice that most of the time in practice, it is not the C source program (made of several translation units) which is running, but an executable result of some compiler & linker.
However, on Linux (e.g. Debian/Sid/x86-64) you could use some of the following tricks (often with function pointers):
use plugins, so design your program to accept them and define conventions about your plugins. A plugin is a shared object ELF file (some *.so) containing position-independent code (so it should be compiled with specific options). You'll use dlopen(3) & dlsym(3) to do the dynamic loading of the plugin.
use some JIT-compiling library, like GCCJIT or LLVM or libjit or asmjit.
alter your virtual address space (not recommended) manually, using mprotect(2) and mmap(2); then you could overwrite something in a code segment (you really should not do that). This might be tricky (e.g. because of ASLR) and brittle.
perhaps use debug related facilities, either with ptrace(2) or by scripting or extending the gdb debugger.
I suggest to play a bit with /proc/ (see proc(5)) and try at least to run in some terminal the following commands
cat /proc/self/maps
cat /proc/$$/maps
ls /proc/$$/fd/
(and read enough things to understand their outputs) to understand a bit more what a process "is".
So overwriting your text segment (if you really need to do that) is possible, but perhaps more tricky than what you believe !
(do you mind working for several weeks or months simply to improve some old gaming experience?)
Read also about homoiconic programming languages (try Common Lisp with SBCL), about dynamic software updating, about persistence, about application checkpointing, and about operating systems (I recommend: Operating Systems: Three Easy Pieces & OsDev wiki)
I work under Debian 64bit
I suppose you have programming skills and do know C. Then you should read ALP or some newer Linux programming book (and of course look into intro(2) & syscalls(2) & intro(3) and other man pages etc...)
BTW, in your particular case, perhaps the "OS" is DOSBOX (acting as some virtual machine). You might use strace(1) on DOSBOX (or on other commands or processes), or study its source code.
You mention games in your question. If you want to code some, consider libraries like SDL, SFML, Qt, GTK+, ....
Yes you can modify piece of code while running in C. You got to have pointer to your program memory area, and compiled pieces of code that you want to change. Naturally this is considered to be a dangerous practice, with lot of restrictions, and with many possibilities for error. However, this was practice at olden times when the memory was precious.

Using parse_datetime from gnu c

I am developing a program for analyzing time series under gnu/linux. To analyze a time window, I want to be able to specify start/end times on the command line. Parsing dates using strptime is simple enough, however I would like to use the flexible 'natural language' format as it is used by the unix ''date'' command. There, this is done using the parse_datetime function.
I have the source of the coreutils, but would like to avoid copying over the code and all attached header files.
My question is: is there a standard library under Unix/Linux which gives access to the full power of parse_datetime().
The function you refer to is not part of any standard, nor any stock utility library. However, it is available as a semi-standalone component as part of gnulib, namely the parse-datetime module. You will need to take it and incorporate it into your program; the gnulib distribution has tools for that. Be aware that if you do this you have to GPL your entire program (this is not a big deal if the program is only for your personal use -- the GPL's requirements only kick in when you start giving the compiled program to other people).
A possible alternative is g_date_set_parse from GLib, but I can't speak to how clever it is.

Determine OS during runtime

Neither ISO C nor POSIX offer functionality to determine the underlying OS during runtime. From a theoretical point of view, it doesn't matter since C offers wrappers for the most common system calls, and from a nit-picking point of view, there doesn't even have to be an underlying OS.
However, in many real-world scenarios, it has proven helpful to know more about the host environment than C is willing to share, e.g. in order to find out where to store config files or how to call select(), so:
Is there an idiomatic way for an application written in C to determine the underlying OS during runtime?
At least, can I easily decide between Linux, Windows, BSD and MacOS?
My current guess is to check for the existence of certain files/directories, such as C:\ or /, but this approach seems unreliable. Maybe querying a series of such sources may help to establish the notion of "OS fingerprints", thus increasing reliability. Anyway, I'm looking forward to your suggestions.
Actually, most systems have a uname command which shows the current kernel in use. On Mac OS, this is usually "Darwin", on Linux it's just plain "Linux", on Windows it's "ERROR" and FreeBSD will return "FreeBSD".
More complete list of uname outputs
I'm pretty sure that there's a C equivalent for uname, so you won't need system()
IF you are on a POSIX system, you can call uname() from <sys/utsname.h>.
This obviously isn't 100% portable, but I don't think there will be any method that can grant that at runtime.
see the man page for details
Runtime isn't the time to determine this, being that without epic kludges binaries for one platform won't run on another, you should just use #ifdefs around the platform sensitive code.
The accepted answer states uname, but doesn't provide a minimal working example, so here it is for anyone interested-hope it will save you the time it took for me:
#include <stdio.h>
#include <stdlib.h>
#include <sys/utsname.h>
int main(void) {
struct utsname buffer;
if (uname(&buffer) != 0) {
perror("uname");
exit(0);
}
printf("OS: %s\n", buffer.sysname);
return 0;
}
(Possible) Output:
OS: Linux
PS: Unfortunately, this uses a POSIX header: Compilation fails due to missing file sys/utsname.h, which most probably won't work in Windows.
if (strchr(getenv("PATH"),'\\'))
puts("You may be on windows...");
Even do I agree that "Runtime isn't the time to determine this..."

How do i compile a c program without all the bloat?

I'm trying to learn x86. I thought this would be quite easy to start with - i'll just compile a very small program basically containing nothing and see what the compiler gives me. The problem is that it gives me a ton of bloat. (This program cannot be run in dos-mode and so on) 25KB file containing an empty main() calling one empty function.
How do I compile my code without all this bloat? (and why is it there in the first place?)
Executable formats contain a bit more than just the raw machine code for the CPU to execute. If you want that then the only option is (I think) a DOS .com file which essentially is just a bunch of code loaded into a page and then jumped into. Some software (e.g. Volkov commander) made clever use of that format to deliver quite much in very little executable code.
Anyway, the PE format which Windows uses contains a few things that are specially laid out:
A DOS stub saying "This program cannot be run in DOS mode" which is what you stumbled over
several sections containing things like program code, global variables, etc. that are each handled differently by the executable loader in the operating system
some other things, like import tables
You may not need some of those, but a compiler usually doesn't know you're trying to create a tiny executable. Usually nowadays the overhead is negligible.
There is an article out there that strives to create the tiniest possible PE file, though.
You might get better result by digging up older compilers. If you want binaries that are very bare to the bone COM files are really that, so if you get hold of an old compiler that has support for generating COM binaries instead of EXE you should be set. There is a long list of free compilers at http://www.thefreecountry.com/compilers/cpp.shtml, I assume that Borland's Turbo C would be a good starting point.
The bloated module could be the loader (operating system required interface) attached by linker. Try adding a module with only something like:
void foo(){}
and see the disassembly (I assume that's the format the compiler 'gives you'). Of course the details vary much from operating systems and compilers. There are so many!

Using readline's rl_insert_text on OS X 10.5

So, I'm trying to stuff some default text into a user input using readline, and having trouble getting it to work on OSX 10.5:
// rl_insert_text_ex.c
// gcc -o rl_insert_text_ex rl_insert_text_ex.c -lreadline
#include <stdio.h>
#include <readline/readline.h>
int my_startup_hook(void) {
return rl_insert_text("ponycorns");
}
int main(int argc, char *argv[]) {
char *line;
rl_startup_hook = (Function*) my_startup_hook;
line = readline("What's your favorite mythical animal? ");
if (NULL == line || '\0' == *line) {
printf("Nothing given... :(\n");
}
else {
printf("That's funny, I love %s too!\n", line);
}
return 0;
}
This code doesn't even compile on 10.4 (no definition for _rl_insert_text on 10.4, which is a bit of a bummer), but does compile on 10.5. However, the rl_insert_text()'d text is never shown to screen, nor returned as user input. The callback is being used and rl_insert_text() returns the proper value, (thank you, printf), so I'm not sure what's going on here.
I checked /usr/include/readline/readline.h, and rl_insert_text() is under:
/* supported functions */
which is confusingly under:
/*
* The following is not implemented
*/
So am I SOL, or am I just doing it wrong?
Unfortunately, you may be out of luck, at least with the readline library included in OS X. Due to license compatibility issues, Apple uses libedit, which (apparently) provides incomplete readline emulation. (This library is documented with the name "editline" in the readline.h included with OS X.)
GNU Readline Library (the "one true" readline library) is under GPL, which (being a copyleft license) does not play well with code that is not entirely open-source. If it comes down to (A) open-sourcing all of Xcode, OS X, etc. or (B) using a knock-off of what you're really like to use, Apple (like most companies) is always going to choose B. It's a bummer, but that's life.
Personally, I think this is one reason that GPL'd code is somewhat of a blight on the land, since in the act of "sticking it to the man", it often also withholds the code from the masses who purchase software. The {BSD,MIT,Apache}-style licenses are much more conducive to use in closed-source systems, and still allow commercial entities to contribute back patches, etc. My guess is that libedit hasn't received enough attention to be fixed properly. Community patches would certainly be welcome, although it's so much nicer if we can use code without having to hack on it ourselves... ;-)
BTW, the same thing applies to other GPL projects — as long as {git,mercurial,bazaar} remains under GPL, don't hold your breath for Apple to ship integration for them in Xcode. :-(
UPDATE: The new Xcode 4 offers git support. Huzzah! My understanding is that this is due to the new plugin architecture which isolates GPL'd code from the main Xcode codebase. However, I emphasize that copyleft licenses are still the wrong solution for code that should benefit everyone. Obviously some people don't agree (you're a pal, anonymous downvoter) but the fact is that GPL can restrict freedoms too — usually its different ones than closed-source/proprietary software generally does, but GPL is also quite effective at preventing illegal use of source code... The difference is a feeling of moral superiority.

Resources