How does debugger know function names? - c

When I debug any program with debugger (for example OllyDbg), in disassembled assembly code, I can see function names, for example:
push 0
call msvcrt.exit
How does the debugger know the function names? Where do they come from? In machine code, it is represented as call address. So how debugger knows it?

Compilers generate "symbols" files, providing to debuggers a way to show the name of a symbol that corresponds to a particular address or an offset. This is highly system-dependent: for example, VS toolchain on Windows places these symbols in separate .pdb files, while on some UNIX flavors these debug symbols are embedded into the executable. EDIT : According to the comments, OllyDbg pulls symbols from the Import Address Table embedded in executable files.
When symbols are embedded into the executable, compiler vendors provide a tool to remove these symbols. For example, GNU provides the strip utility to work with their toolchain.

Related

Dump function bytes from a native .DLL on command line

I would like to list all exported functions in a DLL and dump their bytes. It's pretty trivial to list all the exports using either dumpbin or rabin2 from the radare2 package. I also found a way to disassemble the whole DLL using dumpbin but there's no way to see function boundaries in the dump.
I'm looking for a way to disassemble (with bytes) or ideally just dump the bytes for for a specific or all functions inside a DLL. I don't mind parsing the output if it's got some other information in it. I've tried all kids of tools and so far I was not able to achieve what I need.
One of the possible directions would be to script radare2 to do that.
In order to dump a function's bytes, you will have to know where that function ends.
You could do some static analysis which might work or you could do one of the following:
For 64-bit executables, you can parse the .pdata section which contains a list of RUNTIME_FUNCTIONs. DUMPBIN can do that using either the /unwindinfo or /pdata option.
Note that this may not include every exported function, see reference.
The second option, which works for both 32 and 64-bit executables, is to make use of the DIA SDK
(see IDiaSymbol::get_length). This should cover all exported and non-exported functions but requires you to have access to the executable's .pdb file.

Is it possible to produce working binary without linker?

As far as I know compiler convert source code to machine code. But this code do not have any OS-related sections and linker add them to file.
But is it's possible to make some executable without linker?
Answering your question very literally - yes, it is possible to make an executive file without a linker: you don't need a compiler or linker to generate machine code. Binaries are a series of opcodes and relevant information (offsets, addresses etc). If you open a binary editor then type out some opcodes and make a program. Save and run it.
Of course the binary will be processor specific, just as if you had compiled a binary (native) executive. Here's a reference to the Intel x86 opcodes.
http://ref.x86asm.net/coder32.html.
If you're however asking, "Can I compile a source file directly into an executive file without a linker?" then speaking purely: no - unless the compiler has aspects of a linker integrated within it. The compiler generates intermediate objects that are passed on to the linker to "link" them into a binary such as a library or executive. Without the link step the pipeline is not complete.
Let's first make a statement that is to be considered true, compilers do not generate machine code that can be immediately executed (JIT's do, but lets ignore that).
Instead they generate files (object, static, dynamic, executable) which describe what they contains as well as groups of symbols. Symbols can be global variables or functions.
But symbols just like the file itself contain metadata. This metadata is very important. See the machine code stored in a symbol is the raw instructions for the target architecture but it does not know where memory is stored.
While modern CPU's give each process its own address space, a symbol may not land and probably won't land in the same address twice. In very recent times this is a security measure, but in past its so that dynamic linking works correctly.
So when the OS loads up an executable or shared library it can place it wherever it wants and by doing so make it not repeatable. Otherwise we'd all have to start caring and saying "this file contains 100% of the code I intend to execute". Usually on load the raw binary in the symbol table get transformed by patching it with the symbol locations in RAM. Making everything just work.
In summary the compiler emits files that allow for dynamic patching of assembly
prior to execution. If it didn't, we would be living in a very restrictive and problematic world.
Linkers even have scripts to change how they operate. They are a very complex and delicate piece of software required to make our programs work.
Have a read of the PE-COFF and ELF standards if you want to get an idea of just how complex those formats really are.

How to compile a library for a fixed address in microblaze

I want to build a library which is relocatable (ie. nothing other than local variables. I also want to force the location of the library to be at a fixed location in memory. I think this has to be done in the makefile, but I am confused as to what I have to do to force the library to be loaded at a fixed location. This is using mb-gcc.
The reason I need this is I want to write a loader where I dont want to clobber over the code that is actually doing the copy of the other program. So I want the program that is doing the copying to be located somewhere else at a location that is not being used (ie. ddr).
If I have all the functions that do the compiled into a library, what special makefile arguments do I need to force this to be loaded at location 0x80000000 for example.
Any help would be greatly appreciated. Thanks in advance.
You write a linker script, and tell the compiler/linker to use it by using the -T script.ld option (to gcc and/or ld, depending on how you build your firmware files).
In your library C source files, you can use the __attribute__((section ("name"))) syntax to put your functions and variables into a specific section. The linker script can then decide where to put each section -- often at a fixed address for these kinds of devices. (You'll often see macro declarations like #define FIRMWARE __attribute__((section(".text.firmware"))) or similar, to make the code easier to read and understand.)
If you create a separate firmware file just for your library, then you don't need to add the attributes to your code, just write the linker script to put the .text (executable code), .rodata (read-only constants), and .bss (uninitialized variables) sections at suitable addresses.
A web search for microblaze "linker script" finds some useful examples, and even more guides. Some of them must be suitable for your tools.

Getting stack offsets of variables from debugging symbols

When I build a program with debugging information (gcc -g), gdb is able to tell me addresses of local variables inside a function. Thus, the debugging symbols must contain enough information to calculate this (i.e. an offset from ebp), and since gdb uses libbfd to read debugging symbols, I should be able to as well.
However, libbdf's documentation seems to have nothing on this. Can libbfd give me this information?
libbfd will provide access to the ELF file, opening the file, getting access to the contents of the section, but interpreting these contents is not something that libbfd does, this is something the application would need to do.
Usually, debugging information is encoded using DWARF.
There are libraries for interpreting DWARF however, gdb includes it's own code for parsing DWARF.

Usage differences between. a.out, .ELF, .EXE, and .COFF

Don't get me wrong by looking at the question title - I know what they are (format for portable executable files). But my interest scope is slightly different
MY CONFUSION
I am involved in re-hosting/retargeting applications that are originally from third parties. The problem is that sometimes the formats for object codes are also in .elf, .COFF formats and still says, "Executable and linkable".
I am primarily a Windows user and know that when you compile and assemble your C/C++ code, you get something similar to .o or .obj. that are not executable (well, I never tried to execute them). But when you complete linking static and dynamic libraries and finish building, the executable appears. My understanding is that you can then go about and link that executable or "bash" test it with some form of script if necessary.
However, in Linux (or UNIX-like systems) there are .o files after you compile and assemble the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib. In my quick web search none of the sources mentioned anything about .o files as executables.
QUESTIONS
Therefore my question turns into the followings:
What is the true definitions for portable executables and object code?
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Am I misinterpreting "Linkable"? My interpretation of "Linkable" is something that is compiled object code and can then be "linked" to other static/dynamic link libraries. Is this a stupid thought?
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the executables/object files on a different machine.
Thanks in advance for the right nudges. Meanwhile, I will keep digging and update the question if necessary.
UPDATE
I have managed to dig this out from somewhere :( Seems like a lot to swallow to me.
I am primarily a Windows user and know that when you compile your C/C++ code, you get something similar to .o or .obj. that are not executable
Well, last time I compiled stuff on Windows, the result of the compilation was an .obj file, which is exactly what its name suggests: it's an object file. You're right in that it's not an executable in itself. It contains machine code which doesn't (yet) contain enough information to be directly run on the CPU.
However, in Linux (or UNIX-like systems) there are .o files after you compile the C/C++ code. And once the linking is done, the executable is in a.out format (at least in Ubuntu distribution of Linux). It may very well be .elf in some other distrib.
Living in the 90's, that is :P No modern compilers I am aware of target the a.out format as their default output format for object code. Maybe it's a misleading default of GCC to put the object code into a file called a.out when no explicit output file name is specified, but if you run the file command on a.out, you'll find out that it's an ELF file. The a.out format is ancient and it's kind of "de facto obsolete".
What is the true definitions for portable executables and object code?
You've already got the Wikipedia link to object files, here's the one to "Portable Executable".
How is it that Windows and UNIX platform covers both executables annd object code under the same file format (.COFF, .elf).
Because the ELF format (and apparently COFF too) has been designed like so. And why not? It's just the very same machine code after all, it seems quite logical to use one file format during all the compilation steps. Just like we don't like when dynamic libraries and stand-alone executables have a different format. (That's why ELF is called ELF - it's an "Executable and Linkable Format".)
Am I misinterpreting "Linkable"?
I don't know. From your question it's not clear to me what you think "linkable" is. In general, it means that it's a file that can be linked against, i. e. a library.
Based on question 1. (and perhaps 2) do I need to use symbol tables (e.g. .LUM or .MAP files) with object code then? Symbols as in debug symbols and using them when re-hosting the object files on a different machine.
I think this one is not related to the executable format used. If you want to debug, you have to generate debugging information no matter what. But if you don't need to debug, then you're free to omit them of course.

Resources