I have read Static Memory Allocation are done during Compile time.
Is the 'address allocated' used while generating executables ?
Now, I am in doubt how the memory allocation is handled when the code executable is transferred completely onto a new system.
I searched for it but I didn't get any answer on the internet.
Well, it totally depends on the circumstances whether your executable can be run on your new system or not. Each operating system defines it's own exectuable file format. For example, here's how windows exe's look like. There's a reason why they are called portable executable.
When your compiler generates such an executable, it does first compile your C code to the corresponding assembly of your target architecture and then packs it into the target executable file format. Static memory allocations find their place in that format.
You can imagine the exe file as sort of a memory image that is loaded into a new memory space by the operating systems process loader. The operating system maintains the offset to this location and makes sure all the programs memory access goes into it's process' protected address space.
To answer your specific question: Transferring an executable between systems of the same operating system and architecture is usually no problem. The scenario same OS but different machine architecture can usually be handled by the OS via emulation (e.g. Mac OS's Rosetta emulates PowerPC on x86). 64/32 bit compatibility is handled this way too. Transferring between different OS's is usually not possible (for native executables) but everything that works inside virtual machines (java vm, .net CLR) is no problem, as the process loader loads only the virtual machine and the actual program is run from there.
Related
I am curious about how statically linked C executables would work in different environments. Lets say we compile our C code to target x86 MacOs and we statically include everything it uses in the executable as well (print, strlen). What really stops this executable from running in a Windows OS if we include every library it needs? I understand the file format could be different and break but other than that would this technically be able to run?
I see where you're coming from, operating systems makes us think as programmers that libraries are the be-all end-all of programming, that a call to a library is all you need to make complex things happen and that everything is contained in them.
But the truth is, libraries mostly provide provide an abstraction layer. As an exemple let's create a library called "hello_world.so" which prints "Hello World!" to the console. That library we created relies on stdio to handle the complex I/O stuff but stdio itself depends on at least one other thing: the kernel (some specific targets work without a kernel but these system are outside the scope of this answer).
In the desktop world, things can get really complicated, we have several hundreds of processes all running at once even in an idle system, all these apps need access to the hardware (possibly at once too) so it was decided a controller was needed, some piece of software that would coordinate all other software running on the same computer. This piece of software is usually called a kernel. On Windows it's the NT kernel, on macOS it's the XNU and on Linux it's... the Linux kernel!
On these systems, the biggest job of a library is to abstract the kernel, to make us believe printing text on a Linux or a Windows console works the exact same way when actually it can be completely different! Libraries like stdio/time/etc have different "implementations" but the same "interface": they look the same from the dev point of view but the way they achieve their goals can vary wildy, they can do conversions, calls to other hidden or non hidden functions... All this is completely portable from one OS to the other though, things start to go south for you idea when kernel calls start to show up.
Kernel calls are ways a program can "talk" to the kernel. They can be used to do literally thousands of different things but for example there's one (or several ones) to ask for memory (usually this is called with malloc), one to print to the console, one to ask if a network packet arrived, on to ask to talk to your GPU... And these system calls are completely different from one kernel to the other, sometimes even for two versions of the same kernel!
These "kernel calls" are the only thing preventing you from running statically-compiled linux programs on Windows.
PS: Even though all the above is completely true and kernels can be as different from one another as they wish, due to the history of kernels and of computing in general, some kernels actually share the same interface (even though their implementation as you guessed, can be nothing alike). The best example I know of is how most kernels I know of are based on the UNIX kernel.
It means that -even though I have never tested it myself- I think you should be able to statically link a Linux app and use it on Linux, most BSDs and possibly even macOS
The binary and libraries are specific to the operating system.
The TLDR is that the linking process translates function calls into adresses that points of the Operating System's specific libraries. There are some differences like alignment that happens at compile time, but the responsible for your x86 instructions not running under a different OS is the linker.
Your compiler produces x86 instructions that are ready to execute but is incomplete. The linker will go into every function call give a adress for that function in the executable file, even for functions of the standard library.
The linker will create the executable file following a file format which have a header with information like size, metadata and entry point.Windows and Unix have differents specifications for executable files. Windows has PE and Unix has ELF format both for executables and libraries.
Through some hacks and non-trivial tricks it is possible to create an executable that can run on Windows and Unix (see αcτµαlly pδrταblε εxεcµταblε for how).
But even if you do all that there is an obstacle that can not be circumvented: the kernel. The kernel is, well, a kernel. It's the most important thing in
a OS and it provides a set of API calls that provides basic and low-level access to computer resources, so functions like malloc are implemented using the kernel specific API call, VirtualAlloc for Windows and vmalloc or mmap for POSIX.
Main Answer
If your program does anything useful (print output, return a value, communicate on the network), it contains some form of system-call instruction. Each system-call instruction is a request to a particular operating system, and macOS system calls will not work on Windows and vice-versa.
The system-call instruction sends information to the operating system, including a number identifying which service is requested. The operating system that performs that service. When you build your program for macOS, it includes library routines that contain system-call instructions. If you execute those instructions on a Microsoft Windows system, Windows will not understand the macOS requests. It will interpret the information differently, and the program will not work.
So, in theory, there is nothing preventing you from writing your own program loader that reads an ELF executable file intended for macOS, loading its contents into memory, and transferring control to its entry point. But the program will not work because of the system calls.
Supplement
You might consider translating all the system calls in the program. Changing the primary number that identifies the service request might be feasible; it might not require changing the executable too much. For example, if 37 is a “write to file” request on macOS, your program loader might change it to 48 on Windows. However, the system calls also require other data be passed, such as pointers to buffers, lengths, and so on, and there are likely many discrepancies in how those are passed, so that macOS requests cannot be easily translated into Windows requests. Also, it can be technically challenging to identify all places in a program that a certain instruction is used—some of the contents of memory of a loaded program are instructions and some are data. Most normal programs may be well-behaved and easy to analyze in this regard, but not all are.
Another potential issue is that programs may expect to have certain modes set in the processor, and the host operating system may or may not have set those modes as needed.
I know that dynamic linking are smaller on disk but do they use more RAM at run time. Why if so?
The answer is "it depends how you measure it", and also "it depends which platform you're running on".
Static linking uses less runtime RAM, because for dynamic linking the entire shared object needs to be loaded into memory (I'll be qualifying this statement in a second), whilst with static linking only those functions you actually need are loaded.
The above statement isn't 100% accurate. Only the shared object pages that actually contain code you use are loaded. This is still much less efficient than statically linking, which compresses those functions together.
On the other hand, dynamic linking uses much less runtime RAM, as all programs using the same shared object use the same in RAM copy of the code (I'll be qualifying this statement in a second).
The above is a true statement on Unix like systems. On Windows, it is not 100% accurate. On Windows (at least on 32bit Intel, I'm not sure about other platforms), DLLs are not compiled with position independent code. As such, each DLL carries the (virtual memory) load address it needs to be loaded at. If one executable links two DLLs that overlap, the loader will relocate one of the DLLs. This requires patching the actual code of the DLL, which means that this DLL now carries code that is specific to this program's use of it, and cannot be shared. Such collisions, however, should be rare, and are usually avoidable.
To illustrate with an example, statically linking glibc will probably cause you to consume more RAM at run time, as this library is, in all likelihood, already loaded in RAM before your program even starts. Statically linking some unique library only your program uses will save run time RAM. The in-between cases are in-between.
Different processes calling the same dll/so file can share the read-only memory pages this includes code or text pages.
However each dll loaded in a given peogram has to have its own page for writable global or static data. These pages may be 4/16/64k or bigger depending on the OS. If one statically linked, the static data can be shared in one pages.
Programs, when running on common operating systems like Linux, Windows, MacOSX, Android, ...., are running as processes having some virtual address space. This uses virtual memory (implemented by the kernel driving the MMU).
Read a good book like Operating Systems: Three Easy Pieces to understand more.
So programs don't consume directly RAM. The RAM is a resource managed by the kernel. When RAM becomes scarce, your system experiments thrashing. Read also about the page cache and about memory overcommitment (a feature that I dislike and that I often disable).
The advantage of using a shared library, when the same library is used by several processes, is that its code segment is appearing (technically is paged) only once in RAM.
However, dynamic linking has a small overhead (even in memory), e.g. to resolve relocations. So if a library is used by only one process, that might consume slightly more RAM than if it was statically linked. In practice you should not bother most of the time, and I recommend using dynamic linking systematically.
And in practice, for huge processes (such as your browser), the data and the heap consumes much more RAM than the code.
On Linux, Drepper's paper How To Write Shared Libraries explains a lot of things in details.
On Linux, you might use proc(5) and pmap(1) to explore virtual address spaces. For example, try cat /proc/$$/maps and cat /proc/self/maps and pmap $$ in a terminal. Use ldd(1) to find out the dynamic libraries dependencies of a program, e.g. ldd /bin/cat. Use strace(1) to find out what syscalls(2) are used by a process. Those relevant to the virtual address space include mmap(2) and munmap, mprotect(2), mlock(2), the old sbrk(2) -obsolete- and execve(2).
For Embedded systems where the program runs independently on a micro-controller :
Are the programs always static linked ? or in certain occasions it may be dynamic linked ?
From Wikipedia:
a dynamic linker is the part of an operating system that loads and
links the shared libraries needed by an executable when it is executed
(at "run time"), by copying the content of libraries from persistent
storage to RAM, and filling jump tables and relocating pointers.
So it implies that dynamic linking is possible only if:
1) You have a some kind of OS
2) You have some kind of persistent storage / file system.
On a bare-metal micros it is usually not the case.
Simply put: if there is a full-grown operation system like Linux running on the microcontroller, then dynamic linking is possible (and common).
Without such an OS, you very, very likely use static linking. For this the linker will (basically) not only link the modules and libraries, but also includes the functions which are done by the OS program loader.
Lets stay at these (smaller) embedded systems for now.
Apart from static or dynamic linking, the linker also does relocation. This does - simply put - change internal (relative) addresses of the program to the absolute addresses on the target device.
It is not common on simple embedded systems primarily because it is neither necessary nor supported by the operating system (if any). Dynamic linking implies a certain amount of runtime operating system support.
The embedded systems RTOS VxWorks supports dynamic linking in the sense that it can load and link partially linked object code from a network or file system at runtime. Similarly Larger embedded RTOSs such as QNX support dynamic linking, as does embedded Linux.
So yes large embedded systems may support dynamic linking. In many cases it is used primarily as a means to link LGPL licensed code to a closed source application. It can also be used as a means of simplifying and minimising the impact of deploying changes and updates to large systems.
I'm building an application that has a huge .so file - well over 2GB in size (stripped).
Are there limits to the size of an shared object file?
Because strace shows that the file is refused because it is too big.
My system currently is a 32-bit system, and I also wonder how much this changes when I would build for a 64-bit Linux system.
Since shared library is loaded completely into memory, I would highly recommend you to move your resources away to some external files. IMHO, 2GB is totally non-acceptable for a shared library, and will cause problems on low memory systems.
UPDATE:
Please ignore my first sentence about loading whole shared libraries into memory. As OP commented, shared libraries are indeed mmap'ed, and symbol pages are loaded on demand.
It depends on your system's memory *.so links directly loaded with executable or system itself it can't load if you have low memory or OS allocates a lot of memory and if you build for 64-bit system it will expand more than 2 gb in size, because of adding some 64-bit flags and instructions.
If you compile a program in say, C, on a Linux based platform, then port it to use the MacOS libraries, will it work?
Is the core machine-code that comes from a compiler compatible on both Mac and Linux?
The reason I ask this is because both are "UNIX based" so I would think this is true, but I'm not really sure.
No, Linux and Mac OS X binaries are not cross-compatible.
For one thing, Linux executables use a format called ELF.
Mac OS X executables use Mach-O format.
Thus, even if a lot of the libraries ordinarily compile separately on each system, they would not be portable in binary format.
Furthermore, Linux is not actually UNIX-based. It does share a number of common features and tools with UNIX, but a lot of that has to do with computing standards like POSIX.
All this said, people can and do create pretty cool ways to deal with the problem of cross-compatibility.
EDIT:
Finally, to address your point on byte-code: when making a binary, compilers usually generate machine code that is specific to the platform you're developing on. (This isn't always the case, but it usually is.)
In general you can easily port a program across various Unix brands. However you need (at least) to recompile it on each platform.
Executables (binaries) are not usable on several platforms, because an executable is tightly coupled with the operating system's ABI (Application Binary Interface), i.e. the conventions of how an application communicates with the operating system.
For instance if your program prints a string onto the console using the POSIX write call, the ABI specifies:
How a system call is done (Linux used to call the 0x80 software interrupt on x86, now it uses the specific sysenter instruction)
The system call number
How are the function's arguments transmitted to the system
Any kind of alignment
...
And this varies a lot across operating systems.
Note however that in some cases there may be “ABI adapters” allowing to run binaries of one OS onto another OS. For instance Wine allows you to run Windows executables on various Unix flavors, NDISwrapper allows you to use Windows network drivers on Linux.
"bytecode" usually refers to code executed by a virtual machine (e.g. for java or python). C is compiled to machine code, which the CPU can execute directly. Machine language is hardware-specific so it it would be the same under any OS running on an intel chip (even under Windows), but the details of how the machine code is wrapped into an executable file, and how it is integrated with system calls and dynamically linked libraries are different from system to system.
So no, you can't take compiled code and use it in a different OS. (However, there are "cross-compilers" that run on one OS but generate code that will run on another OS).
There is no "core byte-code that comes from a compiler". There is only machine code.
While the same machine instructions may be applicable under several operating systems (as long as they're run on the same hardware), there is much more to a hosted executable than that, and since a compiled and linked native executable for Linux has very different runtime and library requirements from one on BSD or Darwin, you won't be able to run one binary on the other system.
By contrast, Windows binaries can sometimes be executed under Linux, because Linux provides both a binary format loader for Windows's PE format, as well as an extensive API implementation (Wine). In principle this idea can be used on other platforms as well, but I'm not aware of anyone having written this for Linux<->Darwin. If you already have the source code, and it compiles in Linux, then you have a good chance of it also compiling under MacOS (modulo UI components, of course).
Well, maybe... but most probably not.
But if it does, it's not "because both are UNIX" it's because:
Mac computers happen to use the same processor nowadays (this was very different in the past)
You happen to use a program that has no dependency on any library at all (very unlikely)
You happen to use the same runtime libraries
You happen to use a loader/binary format that is compatible with both.