Creating ELF binaries without using libelf or other libraries - c

Recently I tried to write a simple compiler on the linux platform by myself.
When it comes to the backend of the compiler, I decided to generate ELF-formatted binaries without using a third-party library, such as libelf.
Instead I want to try to write machine code directly into the file coresponding to the ELF ABI just by using the write() function and controlling all details of the ELF file.
The advantage of this approach is that I can control everything for my compiler.
But I am hesitating. Is that way feasible, considering how detailed the ELF ABI is?
I hope for any suggestions and pointers to good available resources available.

How easy/feasible this is depends on what features you want to support. If you want to use dynamic linking, you have to deal with the symbol table, relocations, etc. And of course if you want to be able to link with existing libraries, even static ones, you'll have to support whatever they need. But if your goal is just to make standalone static ELF binaries, it's really very easy. All you need is a main ELF header (100% boilerplate) and 2 PT_LOAD program headers: one to load your program's code segment, the other to load its data segment. In theory they could be combined, but security-hardened kernels do not allow a given page to be both writable and executable, so it would be smart to separate them.
Some suggested reading:
http://www.linuxjournal.com/article/1059

Related

Getting known library paths from ldconfig for use with dlopen

I have a program written in C that uses dlopen for loading plug-in modules. When the library is dynamically loaded, it runs constructor code which register pointer to structure with function implementations with the main application by use of exported function. I want to use absolute path for specifying the file to dlopen.
Then I have other part of the program with takes file, determine if it is ELF, then looks into the ELF header for specific ELF section, read this section and extract from it pertinent information. This way it filters only shared libraries which I have previously tagged as a plug-in module.
However, I am solving a problem how to discover them on the fly (in portable Linux way, i.e. it will run on Debian and on Fedora too and so on) from the main program. I have been thinking about using ldconfig for this. (As the modules will be installed by way of distro packaging system, APT for example.) Is there any way how to programmatically get the string list of known libraries from C program other than directly reading the /etc/ld.co.cache file? I was thinking that maybe there is some header library which will give char** when I ask.
Or, maybe is there any better solution to my problem?
(I am proponent of using standard system components that programming one-off solutions which will need support in the future.)

Building firmware Patch for embedded applications

I have a library stack that is not going to change, and an firmware that is going to use only this stack. Firmware will change alot along the way. I don't want to every time release the whole image(including library stack) because of limited memory and resources issue(This is an embedded application not a desktop or server).
I just want to release the application image and that automatically be able to use the library image. I am not sure how to do it. I know in Windows for example this is handled by dll's. But this is an embedded application and has no OS. Binary images loads to memory and processor is going to execute it.
Any experience/suggestions?
Toolchain: IAR 8051
This depends quite a bit on your tool-chain. Here's a possible high-view approach.
Compile your library into an executable image, setting your linker to use a particular portion of your flash memory space. You'll probably need a fake/stub entry function for the linker to be happy.
Once that is done, find all of the addresses of the symbols used by the library and instruct your linker as to those symbol locations when building your normal program, and do not instruct the link process to use the intermediary library objects when linking. Also instruct the linker to place the code into the section of flash that is update-able.
What you will then have is an image for the library, and the ability to build new versions of the main program image using at library.
This could probably be scripted if your linker output format is an unstripped elf (prior to converting to a binary for burning on the flash), and if your linker can accept a plain text file for instructions (both are true if you are using the gnu toolchains). I'd recommend scripting it for your sanity unless the library has very few externally visible functions and variables in it.
I do have to agree with some of the commentors; unless transferring the library is very hard, you should just build a single simple image that includes the library and push the whole thing. You might say the library will never change now, but inevitably something will come up that requires a change to the library code, and if you change the library and cannot keep the symbols in exactly the same spot, all of your application images will not be able to work with the new library. This is a recipe for a nightmare when dealing with compatible software (firmware) updates.

linking, loading, and virtual memory

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..
why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary
Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

Program location in the memory and static/shared libraries

When I run a program (in linux) does it all get loaded into the physical memory? If so, is using shared libraries, instead of static libraries, help in terms of caching? In general, when should I use shared libraries and when should I use static libraries? My codes are either written in C or in C++ if that matters.
This article hits covers some decent ground on what you want. This article goes much deeper about the advantages of shared libraries
SO also has covered this topic in depth
Difference between static and shared libraries?
When to use dynamic vs. static libraries
Almost all the above mentioned articles are shared library biased. Wikipedia tries to rescue static libraries :)
From wiki,
There are several advantages to statically linking libraries with an
executable instead of dynamically linking them. The most significant
is that the application can be certain that all its libraries are
present and that they are the correct version. This avoids dependency
problems. Usually, static linking will result in a significant
performance improvement.
Static linking can also allow the application
to be contained in a single executable file, simplifying distribution
and installation.
With static linking, it is enough to include those
parts of the library that are directly and indirectly referenced by
the target executable (or target library).
With dynamic libraries, the
entire library is loaded, as it is not known in advance which
functions will be invoked by applications. Whether this advantage is
significant in practice depends on the structure of the library.
Shared libraries are used mostly when you have functionality that could be used and "shared" across different programs. In that case, you will have a single point where all the programs will get their methods. However, this creates a dependency problem since now your compiled programs are dependent on that specific version of the library.
Static libraries are used mostly when you don't want to have dependency issues and don't want your program to care which X or Y libraries are installed on your target system.
So, which one to use?. for that you should answer the following questions:
Will your program be used on different platforms or Linux distributions? (e.g. Red Hat, Debian, SLES11-SP1)
Do you have replicated code that is being used by different binaries?
Do you envision that in the future other programs could benefit from your work?
I think this is a case by case decision, and it is not a one size fits all kind of answer.

Embedded systems header functions

I am new to embedded systems and want to learn more,
I am currently optimizing a software with regards on the footprint for an ARM embedded system, and are wondering, the header files that you include in your source files. Where are they put?
Right now I am just using a software (OVP) to simulate the ARM hardware platform but in real hardware, you have to put the header files somewhere right? Like in gcc have the standard library on the hd. Do we have to insert this library in the embedded machine as well? Space is limited! And is there any way to minimize the size of the library? Thanks!
Example
#include <stdio.h>
#include <stdlib.h>
I am using the cross compiler arm-elf-gcc
Best Regards
Mr Gigu
You appear to possess a few fundamental misunderstandings about compiled executable code. The following applies to embedded and desktop systems.
Header files are no more than sourcefiles like any other. The difference is that they are inserted into the compilation unit by the pre-processor rather than compiled directly. Also in most cases they contain declarative statements only, and do not generally contribute to the generated code in the sense of executable instructions or stored data.
At runtime none of your source code is required to exist on the target; it is the work of the compiler to generate native executable machine code from your source. It is this machine code that is stored and runs on the target.
A header file is not the same thing as a library. It is merely (generally) the declaration of library content (function prototypes and other symbol declarations such as constants, data, macros, enumerations). The library takes the form of pre-compiled/assembled object code stored in a combined archive. It is the job of the linker to combine the required library code with the object code generated from compilation of your own source. It is this linked executable that is stored and executed on the target, not the original source code.
An exception regarding header files containing declarative code only is when they contain in-line code or executable code in a macro. However such code only occupies space in your application if explicitly called by the application.
When library code is linked, only those library object code components required to resolve references in the application code are linked, not the entire library (unless the entire library is composed of only a single object file).
The library does indeed have to get included in the image that is burned into the embedded system's memory. Usually you tell the linker to strip out unused functions, which goes a long way towards conserving memory. But this memory is the memory your program takes up in flash or whatever you use for non-volatile code storage. It doesn't say anything about how much RAM your program takes at runtime. You can also tell your compiler to optimize for space, and also use different runtime libraries - the ones provided by the vendor are often not as fast or small as they could be.

Resources