Disassembling c function in Ida pro - c

I am an intermediate "C" learner. I have written a simple function in c,compiled(Released) sucessfully.
I learned that adding extern "C" to function prohibits compiler from mangling it's name. so, i added extern "C" to my function but after dropping it to ida pro why i could not locate my function name ?
It contains few function but with some sub prefix other are compiler specific,i could not find the function i compiled.
If my way of disassembling function is wrong then please suggest it, I know it could be done in Visual editor while building but i want it other way.

It sounds like you're building an executable. Names get preserved for exported items only, when you are building a library. When you build an executable, the names (mangled or not) can be discarded (by the compiler and linker) because they are no longer needed - references to functions and variables within the executable are all done numerically.
If you build your code as a library (.dll/.so/.dylib depending on platform) and mark your functions as exported, you will see the names.
You can also see the names if you use IDA to look at the intermediate object files (.o/.obj). Those still contain names because the linker needs them to find functions by name during the linking process.
Finally, there is one way you can get function names to appear in IDA - build your executable with symbols enabled (i.e. produce a .pdb file if you're using MS tools). IDA will notice and offer to load the symbols, which will associate names with functions.

Related

VS2012 Identifer not found when part of static lib

Using VS2012 C/C++:
I created and linked a static lib called "libtools" to my project.
Calls to functions in the libtools lib worked as expected.
I created and linked a second static lib called "shunt" to my project.
But when I incorporate a call to a function in shunt, I am getting a c3861 "Identifier not found"
I added both libs to my project in the same way. I added a ref to each one in the Framework and References, and added the full path in the C/C++ Additional directories.
How can I fix this?
C++ uses something called name mangling when it creates symbol names. It's needed because the symbol names must contain the complete function signature.
When you use extern "C" the names will not be mangled, and can be used from other programming languages, like C.
You clearly make the shunt library in C++, which means that the shuntfunc function isn't actually named that way once it passed the compiler. As the actual application using the library is made in C (guessing based on the tag and information) it can't find the shuntfunc symbol, not without telling the C++ compiler to not mangle the symbol name.
That it works for the other library is probably because it's made in C as well.

What is the difference between include and link when linking to a library?

What does include and link REALLY do? What are the differences? And why do I need to specify both of them?
When I write #include math.h and then write -lm to compile it, what does #include math.h and -lm do respectively?
In my understanding, when linking a library, you need its .h file and its .o file. Does this suggest #include math.h means take in the .h file while -lm take in the .o file?
The reason that you need both a header (the interface description) and the library (the implementation) is that C separates the two clearer than languages like C# or Java do. One can compile a C function (e.g. by invoking gcc -c <sourcefile>) which calls library code even when the called library is not present; the header, which contains the interface description, suffices. (This is not possible with C# or Java; the assemblies resp. class files/jars must be present.) During the link stage though the library must be there, even when it's dynamic, afaik.
With C#, Java, or script languages, by contrast, the implementation contains all information necessary to define the interface. The compiler (which is not as clearly separated from the linker) looks in the jar file or the C# assembly which contain called implementations and obtains information about function signatures and types from there.
Theoretically, that information could probably be present in a library written in C as well — it's basically the debug information. But the classic C compiler (as opposed to the linker) is oblivious to libraries or object files and cannot parse them. (One should remember that the "compiler" executable you usually use to compile a C program , e.g. gcc, is a "compiler driver" which interprets the command line arguments and calls the programs which actually do stuff, e.g. the preprocessor, actual compiler and actual linker, to create the desired output.)
So in theory, if you have a properly annotated library in a known location, you could probably write a compiler which compiles a C function against it without having function declarations and type definitions; the compiler would have to produce the proper declarations. The compiler would have to know which library to parse (which corresponds to setting a C# project "Reference" in VS or having a class path and name/class correspondence in Java).
It would probably be easiest to use a well-known debugging format like stabs or dwarf and extract the interface definitions from it with a little helper program which uses the API for the debug format, extracts the information and produces a C header which is prepended to every source file. That would be the job of the compiler driver, and the actual compiler would still be oblivious to that.
It's because headers files contain only declaration and .o files (or .obj, .dll or .lib) contain definitions of methods.
If you open an .h file, you will not see the code of methods, because that is in the libraries.
One reason is commercial, because you need to publish your code and have the source code in your company. Libraries are compiled, so you could publish it.
Header files only tell compiler, what classes and methods it can find in the library.
The header files are kind of a table-of-contents plus a kind of dictionary for the compiler. It tells the compiler what the library offers and gives special values readable names.
The library file itself contains the contents.
What you are asking are entirely two different things.
Don't worry , i will explain them to you.
You use # symbol to instruct the preprocessor to include the math.h header files which internally contain the function prototypes of fabs(),ceil() etc..
And you use -lm to instruct the linker, to include the pre-compiled function definitions of fabs(),ceil() etc. functions in the exe file .
Now, you may ask why we have to explicitly link library file of math functions unlike for other functions and the answer is ,it is due to some undefined historical reasons.

Linking two object files with a common symbol

If i have two object files both defining a symbol (function) "foobar".
Is it possible to tell the linker to obey the obj file order i give in the command line call and always take the symbol from first file and never the later one?
AFAIK the "weak" pragma works only on shared libraries but not on object files.
Please answer for all the C/C++ compiler/linker/operating system combinations you know cause i'm flexibel and use a lot of compiles (sun studio, intel, msvc, gcc, acc).
I believe that you will need to create a static library from the second object file, and then link the first object file and then the library. If a symbol is resolved by an object file, the linker will not search the libraries for it.
Alternatively place both object files in separate static libraries, and then the link order will be determined by their occurrence in the command line.
Creating a static library from an object file will vary depending on the tool chain. In GCC use the ar utility, and for MSVC lib.exe (or use the static library project wizard).
There is a danger here, the keyword here is called Interpositioning dependant code.
Let me give you an example here:
Supposing you have written a custom routine called malloc. And you link in the standard libraries, what will happen is this, functions that require the usage of malloc (the standard function) will use your custom version and the end result is the code may become unstable as the unintended side effect and something will appear 'broken'.
This is just something to bear in mind.
As in your case, you could 'overwrite' (I use quotes to emphasize) the other function but then how would you know which foobar is being used? This could lead to debugging grief when trying to figure out which foobar is called.
Hope this helps,
Best regards,
Tom.
You can make it as a .a file... Then the compiler gets the symbol and doesn't crib later

Distribute binary library on OSX

I'm planning to release some compiled code that shall be linked by client applications on MacOSX.
The distribution is some kind of code library and a set of header files defining the public interface for the library.The code is internally C++ but its public interface (i.e what's being shown in the headers) is completely C.
These are my requirements or atleast what I hope I can accomplish:
I want my library to be as agnostic
as possible for what version of OSX
and GCC the user is running. Having
separate libraries for 64 bit and 32
bit is okay though.
I want my library
to be loadable from languages that
supports loading C libraries such as
python or similar.
I want my
libraries internal symbols to be
isolated from the code it's being
linked into. I don't want to have
duplicate symbol errors because we
happen to name an internal function
in the same way. My C++ code is properly namespaced so this may not be as big of an issue though, but some of the libraries I depend on is C and can be an issue (see next point).
I want my library
dependencies to be safe. My library
depends on some libraries such as
libpng, boost and stl and I don't
want issues because some users don't
necessarily have all of them installed
or get problems because they have
been compiled with other flags or
have different versions than I have.
On Windows I use a DLL with an export library and link all my dependencies statically into the dll. It fulfills all the criteria above and if I can get the same result on OSX it would be great, however I've heard that dynamic libraries tend not to isolate symbols on mac in the same way.
Is there some kind of best practice for this on OSX?
A normal OS X .dylib pretty much satisfies your requirements, with the note that you will want to have an exports file that the linker uses to determine exactly which symbols are exported (to prevent leaking your internal symbols).
In order to make your own library dependencies safe, you will probably need to either include those libraries with yours or link them statically into your library.
edit: To answer your follow-up question of how to apply an exports file to a link command, the man page for ld has the following to say:
-exported_symbols_list filename
The specified filename contains a list of global symbol names
that will remain as global symbols in the output file. All
other global symbols will be treated as if they were marked
as __private_extern__ (aka visibility=hidden) and will not be
global in the output file. The symbol names listed in file-
name must be one per line. Leading and trailing white space
are not part of the symbol name. Lines starting with # are
ignored, as are lines with only white space. Some wildcards
(similar to shell file matching) are supported. The *
matches zero or more characters. The ? matches one charac-
ter. [abc] matches one character which must be an 'a', 'b',
or 'c'. [a-z] matches any single lower case letter from 'a'
to 'z'.
So, if your library had only two functions that you wanted to be public, lets call them foo and bar, and they were C functions (so the symbol names aren't mangled), your exports file (let's call it myLibrary.exports) would contain these two lines:
_foo
_bar
and maybe some comments, etc. When you do the final link step to build the library, you would pass the -exported_symbols_list myLibrary.exports flag to the linker. This has the additional benefit that the link will fail if you don't provide one of the exported symbols; this can catch a lot of "oops, I forgot to include that file in the build" mistakes.
You don't need to use the command-line tools to do all this, of course. In the build settings for a dynamic library in XCode, you will find Exported Symbols File (undefined by default); set it to the path to your exports file there and it will be passed to the linker.
The key term you need is 'framework'. You need to create a 'universal' framework that is self-contained. ('Universal' is Apple-ease for 'compile several times and package into one library.) It's not as straightforward as on Windows in terms of encapsulation, but the necessary linker options are there.

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?
Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.
It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.
It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)
Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.
This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.
Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.
It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.
It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out
It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Resources