GNU Binutils' Binary File Descriptor library - format example - c

As in title. I tried reading the BFD's ELF's code, but it's rather not a light reading. I also tried to get something from the documentation, but I would need an example to see how it works. Could anyone point me some easier example for me, to know how to define an executable format?
Edit: Looks like I didn't formulate the question properly. I don't ask "how to create own executable format specification?", nor "where is good ELF documentation?", but "how can I implement my own executable format using GNU BFD?".

You did look here http://sourceware.org/binutils/docs-2.21/bfd/index.html and here http://sourceware.org/binutils/binutils-porting-guide.txt?
Also studying the MMO implementation of a BFD backend as mentioned here http://sourceware.org/binutils/docs-2.21/bfd/mmo.html#mmo (source: http://sourceware.org/cgi-bin/cvsweb.cgi/src/bfd/mmo.c?cvsroot=src) might be less complex than starting with ELF ... ;-)

I agree that BFD documentation is somewhat lacking. Here are some better sources:
ELF Format
System V ABI (section 4)
Here are a couple of readable introductions:
Linux Journal
Dr. Dobbs
And some examples that don't use libbfd:
ELF IO
ELF Toolchain
LibELF

The DOS COM file is the simplest possible format.
Load up to 64k less 256 bytes at seg:0100h, set DS,ES,SS=seg, SP=FFFFh and jump to seg:0100h

Related

why are there so many versions of header files in my system?

I learned to program with Pascal in high school, and more recently I decided to get out of the sandbox and try to figure out how my computer actually works. So I installed ubuntu on my iMac (i686) and started learning C, which seemed like a good way to get "under the hood."
One of the basic things I'm trying to figure out is where the kernel ends and the standard libraries begin. A book told me that the linux system calls (which I understand to be the interface between the kernel and the libraries) could be found in the header file unistd.h, so this seemed like a good place to start. But when I tried to find the header on my system (using locate unistd.h), I got this result:
/usr/include/unistd.h
/usr/include/asm-generic/unistd.h
/usr/include/i386-linux-gnu/asm/unistd.h
/usr/include/i386-linux-gnu/bits/unistd.h
/usr/include/i386-linux-gnu/sys/unistd.h
/usr/include/linux/unistd.h
/usr/lib/syslinux/com32/include/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/alpha/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/arm/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/avr32/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/blackfin/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/c6x/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/cris/include/arch-v10/arch/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/cris/include/arch-v32/arch/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/cris/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/frv/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/h8300/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/hexagon/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/ia64/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/m32r/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/m68k/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/microblaze/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/mips/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/mn10300/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/openrisc/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/parisc/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/powerpc/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/s390/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/score/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/sh/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/sparc/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/tile/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/unicore32/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/x86/include/asm/ia32_unistd.h
/usr/src/linux-headers-3.5.0-27/arch/x86/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/arch/xtensa/include/asm/unistd.h
/usr/src/linux-headers-3.5.0-27/include/asm-generic/unistd.h
/usr/src/linux-headers-3.5.0-27/include/linux/unistd.h
/usr/src/linux-headers-3.5.0-27-generic/include/linux/unistd.h
Why the heck are there so many versions of this file--and other header files--in my system? Some of them seem to be for other CPUs (like sparc), so why did ubuntu bother to install them on my computer? And how does the all of this fit with what Eric Raymond calls the SPOT rule: "every piece of knowledge must have a single, unambiguous, authoritative representation within a system." (The Art of Unix Programming, p. 91.)
Thanks in advance for any help. I'm happy to read big books if necessary.
I think these header files are directly from linux-3.5.0-27 source code. Ubuntu developers didn't know what kind of target they are dealing with. Maybe Intel x86/powerPC/ or even a mobile hand set(ARM), so they just copy all the head files and make a simple link.

ASN.1 compilers

This is my first question so please bear with me, I'm trying to find a good compiler to parse the following standard, I've tried asn1c and I wasn't able compile it successfully, the problem is that I didn't get any error, I tried with "-P" and there was no output, this is what I did
sn1c MAP-ShortMessageServiceOperations.EXP -P
The files in the link have been generated by SIEMENS and for some reason they are not the standard ASN.1
The question is how can I compile all these ASN files with asn1c or any other asn compiler?
and also I've tried snacc and didn't get anything useful from it either.
I'm trying to write a C application that will be running on Red-hat Linux
A great place to start is http://www.itu.int/ITU-T/asn1/links/index.htm which list several ASN.1 compilers (both free and commercial).
You can also try your specification in the free online compiler and encoder/decoder at http://asn1-playground.oss.com.
For 3gpp specifications, you are likely to be better off using a commercial tool rather than one of the free tools. OSS Nokalva offers free trials of its ASN.1 Tools at http://www.oss.com/asn1/products/asn1-download.html.
http://www.itu.int/en/ITU-T/asn1/Pages/Tools.aspx mentions two free ASN.1 compilers, one of which is
http://lionet.info/asn1c/compiler.html which is FOSS and seems to be well received (http://lionet.info/asn1c/quotes.html)
The second “free” compiler listed seems to have gone commercial (http://www.oss.com/asn1/products/asn1-c/asn1-c.html), BUT the same company offer a free online encode/decode service at http://asn1-playground.oss.com/
I can’t figure out if this one is free or requires purchase - http://www.obj-sys.com/products/asn1c/index.php
This one is FOSS and might be worth examining - https://github.com/ttsiodras/asn1scc
Another tool for 3GPP specifications is online encoder/decoder http://3gpp-message-analyser.com It supports UTRAN RANAP SABP RNSAP NBAP PCAP RUA HNBAP, E-UTRAN S1AP X2AP M2AP M3AP
You can also find various free online ASN.1 decoder here:
https://www.marben-products.com/decoder-asn1/
I hope it helps.
Anto
I came across this compiler that works well for generating C code from ASN.1
https://github.com/ttsiodras/asn1scc
There is also a nice introduction that will get you started, here:
https://www.thanassis.space/asn1.html
This is the only compiler that worked for me so far.

How to extract C source code from .so file?

I am working on previously developed software and source code is compiled as linux shared libraries (.so) and source code is not present. Is there any tool which can extract source code from the linux shared libraries?
Thanks,
Ravi
There isn't. Once you compile your code there is no trace of it left in the binary, only machine code.
Some may mention decompilers but those don't extract the source, they analyze the executable and produce some source that should have the same effect as the original one did.
You can try disassembling the object code and get the machine code mnemonics.
objdump -D --disassembler-options intel sjt.o to get Intel syntax assembly
objdump -D --disassembler-options att sjt.o or objdump -D sjt.o to get AT&T syntax assembly
But the original source code could never be found. You might try to reverse the process by studying and reconstruct the sections. It would be hell pain.
Disclaimer: I work for Hex-Rays SA.
The Hex-Rays decompiler is the only commercially available decompiler I know of that works well with real-life x86 and ARM code. It's true that you don't get the original source, but you get something which is equivalent to it. If you didn't strip your binary, you might even get the function names, or, with some luck, even types and local variables. However, even if you don't have symbol info, you don't have to stick to the first round of decompilation. The Hex-Rays decompiler is interactive - you can rename any variable or function, change variable types, create structure types to represent the structures in the original code, add comments and so on. With a little work you can recover a lot. And quite often what you need is not the whole original file, but some critical algorithm or function - and this Hex-Rays can usually provide to you.
Have a look at the demo videos and the comparison pages. Still think "staring at the assembly" is the same thing?
No. In general, this is impossible. Source is not packaged in compiled objects or libraries.
You cannot. But you can open it as an archive in 7-Zip. You can see the file type and size of each file separately in that. You can replace the files in it with your custom files.

How to write a linker

I have written a compiler for C that outputs byte code. The reason for this was to be able to write applications for an embedded platform that runs on multiple platforms.
I have the compiler and the assembler.
I need to write a linker, and am stuck.
The object format is a custom one, designed around the byte code interpreter, so I cant really use any existing linkers.
My biggest hurdle is how to organize the object code to output the linked binary.
Dynamic linking is not necessary, at this time.
I need to get static linking working first.
Ian Lance Taylor, one of the main developers on the gold linker(now part of binutils), posted a series of blogs on how linkers work. You can find it here.
http://linker.iecc.com is the only book I know about this subject.
I second the Linkers and Loaders book. You state that your object format is a custom one. If the format is under your control, you could consider using the ELF format with your bytecode as a new machine architecture, a la x86, SPARC, ARM, etc. The GNU binutils sources are sufficiently malleable to allow you to incorporate your "architecture".

what is the easiest way to lookup function names of a c binary in a cross-platform manner?

I want to write a small utility to call arbitrary functions from a C shared library. User should be able to list all the exported functions similar to what objdump or nm does. I checked these utilities' source but they are intimidating. Couldn't find enough information on google, if dl library has this functionality either.
(Clarification edit: I don't want to just call a function which is known beforehand. I will appreciate an example fragment along your answer.)
This might be near to what you're looking for:
http://python.net/crew/theller/ctypes/
Well, I'll speak a little bit about Windows. The C functions exported from DLLs do not contain information about the types, names, or number of arguments -- nor do I believe you can determine what the calling convention is for a given function.
For comparison, take a look at National Instrument's LabVIEW programming environment. You can import functions from DLLs, but you have to manually type in the type and names of the arguments before you use a given function. If this limitation is OK, please edit your question to reflect that.
I don't know what is possible with *nix environments.
EDIT: Regarding your clarification. If you don't know what the function is ahead of time, you're pretty screwed on Windows because in general you won't be able to determine what the number and types of arguments the functions take.
You could try ParaDyn's SymtabAPI. It lets you grab all the symbols in a shared library (or executable) and look at their types, offset, etc. It's all wrapped up in a reasonably nice C++ interface and runs on a lot of platforms. It also provides support for binary rewriting, which you could potentially use to do what you're talking about at runtime.
Webpage is here:
http://www.paradyn.org/html/symtab2.1-features.html
Documentation is here:
http://ftp.cs.wisc.edu/paradyn/releases/release5.2/doc/symtabProgGuide.21.pdf
A standard-ish API is the dlopen/dlsym API; AFAIK it's implemented by GNU libc on Linux and Mac OS X's standard C library (libSystem), and it might be implemented on Windows by MinGW or other compatibility packages.
Only sensible solution (without reinventing the wheel) seems to use libbfd. Downsides are its documentation is scarce and it is a bit bloated for my purposes.
The source code for nm and objdump are available. If you want to start from specification then ELF is what you want to look into.
/Allan
I've written something like this in Perl. On Win32 it runs dumpbin /exports, on POSIX it runs nm -gP. Then, since it's Perl, the results are interpreted using regular expressions: / _(\S+)#\d+/ for Win32 (stdcall functions) and /^(\S+) T/ for POSIX.
Eek! You've touched on one of the very platform-dependent topics of programming. On windows, you have DLLs, on linux, you have ld.so, ld-linux.so, and mac os x's dyld.

Resources