memory sharing in .o file - c

Lets say i have a A.c file and compile it to a A.o file.
The A.c file is as follows:
int a;
void add(void)
{
}
The A.o file together with a B.o form a 1.exe file.
The A.o file together with a C.o form a 2.exe file.
My question if I run 1.exe and 2.exe at the same time, will the address of a and add() be the same for these two .exe file? In another word, are there two A.o in the memory or there is only one?

You don't have any relocated object files in memory.
I am guessing you have a Linux system. If on Windows, the principles remain the same, but the details are different.
The linker (called to build both 1.exe and 2.exe) build an executable ELF file (made of several segments, notably the so called "text" segment for machine code and read-only constant data, and the "data" segment for mutable data). The execve(2) syscall starting that program is memory mapping several segments of the ELF file (nearly as would some mmap(2) syscall).
Notice that using .exe file suffix for Linux executable files is confusing and uncommon. Conventionally, Linux executables have no suffix at all and start with a lowercase letter.
The linker has copied and relocated both A.o files into different things (because of the relocation). So it usually happens that the address of a or add is different in 1.exe and in 2.exe, and so are the machine instructions dealing with them.
Each process has its own address space (which can change with e.g. mmap(2) syscall). Type cat /proc/1234/maps to understand the address space of process of pid 1234. Try also cat /proc/self/maps to get the address space of the process running that cat.
If instead of A.o object you have a shared object (or dynamic library) libA.so some of its (mmap-ed) segments will be shared (and others would be using copy on write techniques), and some relocation happens at dynamic link time (e.g. during dlopen if it is a plugin).
Read also Levine's book on Linkers and Loaders.

The addresses of a and add will not be the same. You have two processes, therefore two copies in memory in this instance.

it depends on linking....by default linking is always dynamic so when u load two exe files into memory the file A.o will behave as reentrant function and its only one instance will be in memory which is shared by those two exe file...
its like printf function...there are so many printf any diff diff files but at a time only one instance is running in memory...
reentrant functions should not deliver any data corruption that u must take care of

Related

Position-independent executable: What is "main executable binary"?

When reading https://en.wikipedia.org/wiki/Address_space_layout_randomization, I encountered a term:
Position-independent executable (PIE) implements a random base address for the main executable binary and has been in place since 2003. It provides the same address randomness to the main executable as being used for the shared libraries.
What does main executable binary mean here? Is it just a tu/source file that contains the main function?
It means the output of the linker when you build your executable, the so-called a.out file in the *nix world. The compiler creates object files and the linker resolves all the dependencies into an a.out file. Some of those dependencies will be external (dynamic link libraries).
The main executable will be the file that the os (possibly the linker) loads initially when you execute it. Subsequent loads would be the dynamic link libraries that are external dependencies created during the build process.
I assume it is the binary with main() function, so the program in strict sense.
Previously, programs were loaded on specific addresses, but dynamic libraries were already loaded at different addresses, so I think the main here it is just to empathize that it is for the program binary and not for the library binaries.

Is Dynamic Linker part of Kernel or GCC Library on Linux Systems?

Is Dynamic Linker (aka Program Interpreter, Link Loader) part of Kernel or GCC Library ?
UPDATE (28-08-16):
I have found that the default path for dynamic linker that every binary (i.e linked against a shared library) uses /lib64/ld-linux-x86-64.so.2 is a link to the shared library /lib/x86_64-linux-gnu/ld-2.23.so which is the actual dynamic linker.
And It is part of libc6 (2.23-0ubuntu3) package viz. GNU C Library: Shared libraries in ubuntu for AMD64 architectures.
My actual question was
what would happen to all the applications that are dynamically linked (all,now a days), if this helper program (ld-2.23.so) doesn't exist ?
And answer to that is " no application would run, even the shell program ". I've tried it on virutal machine.
In an ELF executable, this is referred to as the "ELF interpreter". On linux (e.g.) this is /lib64/ld-linux-x86-64.so.2
This is not part of the kernel and [generally] with glibc et. al.
When the kernel executes an ELF executable, it must map the executable into userspace memory. It then looks inside for a special sub-section known as INTERP [which contains a string that is the full path].
The kernel then maps the interpreter into userspace memory and transfers control to it. Then, the interpreter does the necessary linking/loading and starts the program.
Because ELF stands for "extensible linker format", this allows many different sub-sections with the ELF file.
Rather than burdening the kernel with having to know about all the myriad of extensions, the ELF interpreter that is paired with the file knows.
Although usually only one format is used on a given system, there can be several different variants of ELF files on a system, each with its own ELF interpreter.
This would allow [say] a BSD ELF file to be run on a linux system [with other adjustments/support] because the ELF file would point to the BSD ELF interpreter rather than the linux one.
UPDATE:
every process(vlc player, chrome) had the shared library ld.so as part of their address space.
Yes. I assume you're looking at /proc/<pid>/maps. These are mappings (e.g. like using mmap) to the files. That is somewhat different than "loading", which can imply [symbol] linking.
So primarily loader after loading the executable(code & data) onto memory , It loads& maps dynamic linker (.so) to its address space
The best way to understand this is to rephrase what you just said:
So primarily the kernel after mapping the executable(code & data) onto memory, the kernel maps dynamic linker (.so) to the program address space
That is essentially correct. The kernel also maps other things, such as the bss segment and the stack. It then "pushes" argc, argv, and envp [the space for environment variables] onto the stack.
Then, having determined the start address of ld.so [by reading a special section of the file], it sets that as the resume address and starts the thread.
Up until now, it has been the kernel doing things. The kernel does little to no symbol linking.
Now, ld.so takes over ...
which further Loads shared Libraries , map & resolve references to libraries. It then calls entry function (_start)
Because the original executable (e.g. vlc) has been mapped into memory, ld.so can examine it for the list of shared libraries that it needs. It maps these into memory, but does not necessarily link the symbols right away.
Mapping is easy and quick--just an mmap call.
The start address of the executable [not to be confused with the start address of ld.so], is taken from a special section of the ELF executable. Although, the symbol associated with this start address has been traditionally called _start, it could actually be named anything (e.g. __my_start) as it is what is in the section data that determines the start address and not address of the symbol _start
Linking symbol references to symbol definitions is a time consuming process. So, this is deferred until the symbol is actually used. That is, if a program has references to printf, the linker doesn't actually try to link in printf until the first time the program actually calls printf
This is sometimes called "link-on-demand" or "on-demand-linking". See my answer here: Which segments are affected by a copy-on-write? for a more detailed explanation of that and what actually happens when an executable is mapped into userspace.
If you're interested, you could do ldd /usr/bin/vlc to get a list of the shared libraries it uses. If you looked at the output of readelf -a /usr/bin/vlc, you'll see these same shared libraries. Also, you'd get the full path of the ELF interpreter and could do readelf -a <full_path_to_interpreter> and note some of the differences. You could repeat the process for any .so files that vlc wanted.
Combining all that with /proc/<pid>maps et. al. might help with your understanding.

How to wrote programs to implement shared memory page in C [duplicate]

I want to use same library functions (i.e. OpenSSL library ) in two different programs in C for computation. How can I make sure that both program use a common library , means only one copy of library is loaded into shared main memory and both program access the library from that memory location for computation?
For example, when 1st program access the library for computation it is loaded into cache from main memory and when the 2nd program wants to access it later , it will access the data from cache ( already loaded by 1st program), not from main memory again.
I am using GCC under Linux. Any explanation or pointer will be highly appreciated .
Code gets shared by the operating system, not only of shared libraries but also of executables of the same binary — you don't have to do anything to have this feature. It is part of the system's memory management.
Data will not get shared between the two processes. You would need threads in one process to share data. But unless you want that, just make sure both programs use exactly the same shared library file (.so file). Normally you won't have to think about that; it only might be important if two programs use different versions of a library (they would not get shared of course).
Have a look at the output of ldd /path/to/binary to see which shared libraries are used by a binary.
Read Drepper's paper How to Write Shared Libraries and Program Library HowTo
To make one, compile your code as position independent code, e.g.
gcc -c -fPIC -O -Wall src1.c -o src1.pic.o
gcc -c -fPIC -O -Wall src2.c -o src2.pic.o
then link it into a shared object
gcc -shared src1.pic.o src2.pic.o -lsome -o libfoo.so
you may link a shared library -lsome into another one libfoo.so
Internally, the dynamic linker ld-linux.so(8) is using mmap(2) (and will do some relocation at dynamic link time) and what matters are inodes. The kernel will use the file system cache to avoid reading twice a shared library used by different processes. See also linuxatemyram.com
Use e.g. ldd(1), pmap(1) and proc(5). See also dlopen(3). Try
cat /proc/self/maps
to understand the address space of the virtual memory used by the process running that cat command; not everything of an ELF shared library is shared between processes, only some segments, including the text segment...

Minimum 504Kb Memory Usage

On doing some experiments whilst learning C, I've come across something odd. This is my program:
int main(void) {sleep(5);}
When it is compiled, the file size of the executable is 8496 bytes (in comparison to the 26 byte source!) This is understandable as sleep is called and the instructions for calling that are written in the executable. Another point to make is that without the sleep, the executable becomes 4312 bytes.
int main(void) {}
My main question is what happens when the first program is run. I'm using clang to compile and Mac OS X to run it. The result (according to Activity Monitor) is that the program uses 504KB of "real memory". Why is it so big when the program is just 4KB? I am assuming that the executable is loaded into memory but I haven't done anything apart from a sleep call. Why does my program need 500KB to sleep for five seconds?
By the way, the reason I'm using sleep is to be able to catch the amount of memory being used using Activity Monitor in the first place.
I ask simply out of curiosity, cheers!
When you compile a C program it is linked into an executable. Even though your program is very small it will link to the C runtime which will include some additional code. There may be some error handling and this error handling may write to the console and this code may include sprintf which adds some footprint to your application. You can request the linker to produce a map of the code in your executable to see what is actually included.
Also, an executable file contains more than machine code. There will be various tables for data and dynamic linking which will increase the size of the executable and there may also be some wasted space because the various parts are stored in blocks.
The C runtime will initialize before main is called and this will result in both some code being loaded (e.g. by dynamically linking to various operating system features) as well as memory being allocated for a a heap, a stack for each thread and probably also some static data. Not all of this data may show as "real memory" - the default stack size on OS X appears to be 8 MB and your application is still using much less than this.
In this case I suppose the size difference you've observed is significantly caused by dynamic linking.
Linkers usually don't place common code into the executable binaries, instead they reserve the information and the code would be loaded when the binary is loaded. Here those common code is stored in files called shared object(SO) or dynamically linked library(DLL).
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
6.6K a.out
[pengyu#GLaDOS temp]$ gcc test.c -static
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
807K a.out
ALso, here I'm listing what are there in the memory of a process:
There're necessary dynamic libraries to be loaded in:
Here ldd gives the result of dynamic libraries to be loaded when invoking the binary. These libraries locates in the part where it's obtained by calling the mmap system call.
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ ldd ./a.out
linux-vdso.so.1 (0x00007fff576df000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f547a212000)
/lib64/ld-linux-x86-64.so.2 (0x00007f547a5bd000)
There're sections like .data, .code to be allocated for data from your binary file.
This part exists in the binary executable, so the size is supposed to be no lager than the file itself. Contents copied at the loading stage of a executable binary.
There're sections like .bss and also the stack zone to be allocated for dynamically use during execution of the program.
This part does not exist in the binary executable, so the size could be quite large without being affected by size of the file itself.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.

Resources