Will static linking allow cross platform execution? - c

I am curious about how statically linked C executables would work in different environments. Lets say we compile our C code to target x86 MacOs and we statically include everything it uses in the executable as well (print, strlen). What really stops this executable from running in a Windows OS if we include every library it needs? I understand the file format could be different and break but other than that would this technically be able to run?

I see where you're coming from, operating systems makes us think as programmers that libraries are the be-all end-all of programming, that a call to a library is all you need to make complex things happen and that everything is contained in them.
But the truth is, libraries mostly provide provide an abstraction layer. As an exemple let's create a library called "hello_world.so" which prints "Hello World!" to the console. That library we created relies on stdio to handle the complex I/O stuff but stdio itself depends on at least one other thing: the kernel (some specific targets work without a kernel but these system are outside the scope of this answer).
In the desktop world, things can get really complicated, we have several hundreds of processes all running at once even in an idle system, all these apps need access to the hardware (possibly at once too) so it was decided a controller was needed, some piece of software that would coordinate all other software running on the same computer. This piece of software is usually called a kernel. On Windows it's the NT kernel, on macOS it's the XNU and on Linux it's... the Linux kernel!
On these systems, the biggest job of a library is to abstract the kernel, to make us believe printing text on a Linux or a Windows console works the exact same way when actually it can be completely different! Libraries like stdio/time/etc have different "implementations" but the same "interface": they look the same from the dev point of view but the way they achieve their goals can vary wildy, they can do conversions, calls to other hidden or non hidden functions... All this is completely portable from one OS to the other though, things start to go south for you idea when kernel calls start to show up.
Kernel calls are ways a program can "talk" to the kernel. They can be used to do literally thousands of different things but for example there's one (or several ones) to ask for memory (usually this is called with malloc), one to print to the console, one to ask if a network packet arrived, on to ask to talk to your GPU... And these system calls are completely different from one kernel to the other, sometimes even for two versions of the same kernel!
These "kernel calls" are the only thing preventing you from running statically-compiled linux programs on Windows.
PS: Even though all the above is completely true and kernels can be as different from one another as they wish, due to the history of kernels and of computing in general, some kernels actually share the same interface (even though their implementation as you guessed, can be nothing alike). The best example I know of is how most kernels I know of are based on the UNIX kernel.
It means that -even though I have never tested it myself- I think you should be able to statically link a Linux app and use it on Linux, most BSDs and possibly even macOS

The binary and libraries are specific to the operating system.

The TLDR is that the linking process translates function calls into adresses that points of the Operating System's specific libraries. There are some differences like alignment that happens at compile time, but the responsible for your x86 instructions not running under a different OS is the linker.
Your compiler produces x86 instructions that are ready to execute but is incomplete. The linker will go into every function call give a adress for that function in the executable file, even for functions of the standard library.
The linker will create the executable file following a file format which have a header with information like size, metadata and entry point.Windows and Unix have differents specifications for executable files. Windows has PE and Unix has ELF format both for executables and libraries.
Through some hacks and non-trivial tricks it is possible to create an executable that can run on Windows and Unix (see αcτµαlly pδrταblε εxεcµταblε for how).
But even if you do all that there is an obstacle that can not be circumvented: the kernel. The kernel is, well, a kernel. It's the most important thing in
a OS and it provides a set of API calls that provides basic and low-level access to computer resources, so functions like malloc are implemented using the kernel specific API call, VirtualAlloc for Windows and vmalloc or mmap for POSIX.

Main Answer
If your program does anything useful (print output, return a value, communicate on the network), it contains some form of system-call instruction. Each system-call instruction is a request to a particular operating system, and macOS system calls will not work on Windows and vice-versa.
The system-call instruction sends information to the operating system, including a number identifying which service is requested. The operating system that performs that service. When you build your program for macOS, it includes library routines that contain system-call instructions. If you execute those instructions on a Microsoft Windows system, Windows will not understand the macOS requests. It will interpret the information differently, and the program will not work.
So, in theory, there is nothing preventing you from writing your own program loader that reads an ELF executable file intended for macOS, loading its contents into memory, and transferring control to its entry point. But the program will not work because of the system calls.
Supplement
You might consider translating all the system calls in the program. Changing the primary number that identifies the service request might be feasible; it might not require changing the executable too much. For example, if 37 is a “write to file” request on macOS, your program loader might change it to 48 on Windows. However, the system calls also require other data be passed, such as pointers to buffers, lengths, and so on, and there are likely many discrepancies in how those are passed, so that macOS requests cannot be easily translated into Windows requests. Also, it can be technically challenging to identify all places in a program that a certain instruction is used—some of the contents of memory of a loaded program are instructions and some are data. Most normal programs may be well-behaved and easy to analyze in this regard, but not all are.
Another potential issue is that programs may expect to have certain modes set in the processor, and the host operating system may or may not have set those modes as needed.

Related

GCC preprocessor directives for Arch Linux

Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
I need to check that my software restricts itself from compiling under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
Does anyone know how to guarantee through GCC preprocessor directives that my binaries are only compilable under Arch Linux?
Of course I can always
#ifdef __linux__
...
#endif
But this is not precise enough.
Edit: This must be done through C source code and not by any building systems, so, for example, doing this through CMake is completely discarded.
Edit 2: Users faking this behaviour is not a problem since the software is distributed to selected clients and thus, actively trying to "misuse" our source code is "their decision".
Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
No. Because there's nothing inherently specific to Arch Linux on the binary level. For what it's worth, when compiling the only things you/the compiler has to care about is the target architecture (i.e. what kind of CPU it's going to run with), data type sizes and alignments and function calling conventions.
Then later on, when it's time to link the compiled translation unit objects into the final binary executable, the runtime libraries around are also of concern. Without taking special precautions you're essentially locking yourself into the specific brand of runtime libraries (glibc vs. e.g. musl; libstdc++ vs. libc++) pulled by the linker.
One can easily sidestep the later problem by linking statically, but that limits the range of system and midlevel APIs available to the program. For example on Linux a purely naively statically linked program wouldn't be able to use graphics acceleration APIs like OpenGL-3.x or Vulkan, since those rely on loading components of the GPU drivers into the process. You can however still use X11 and indirect GLX OpenGL, since those work using wire protocols going over sockets, which are implemented using direct syscalls to the kernel.
And these kernel syscalls are exactly the same on the binary level for each and every Linux kernel of every distribution out there. Although inside of the kernel there's a lot of leeway when it comes to redefining interfaces, when it comes to the interfaces toward the userland (i.e. regular programs) there's this holy, dogmatic, ironclad rule that YOU NEVER BREAK USERLAND! Kernel developers breaking this rule, intentionally or not are chewed out publicly by Linus Torvalds in his in-/famous rants.
The bottom line to this is, that there is no such thing as a "Linux distribution specific identifier on the binary level". At the end of the day, a Linux distribution is just that: A distribution of stuff. That means someone or more decided on a set of files that make up a working Linux system, wrap it up somehow and slap a name on it. That's it. There's nothing inherently specific to "Arch" Linux other than it's called "Arch" and (for the time being) relies on the pacman package manager. That's it. Everything else about "Arch", or any other Linux distribution, is just a matter of happenstance.
If you really want to sort different Linux distributions into certain bins regarding binary compatibility, then you'd have to pigeonhole the combinations of
Minimum required set of supported syscalls. This translates into minimum required kernel version.
What libc variant is being used; and potentially which version, although it's perfectly possible to link against a minimally supported set of functions, that has been around for almost "forever".
What variant of the C++ standard library the distribution decided upon. This actually also inflicts programs that might appear to be purely C, because certain system level libraries (*cough* Mesa *cough*) will internally pull a lot of C++ infrastructure (even compilers), also triggering other "fun" problems¹
I need to check that my software restricts itself from running under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
You couldn't find resources on the Internet, because there's nothing specific on the binary level that makes "Arch" Arch. For what it's worth right now, this instant I could create a fork of Arch, change out its choice of default XDG skeleton – so that by default user directories are populated with subdirs called leech, flicks, beats, pics – and call it "l33tz" Linux. For all intents and purposes it's no longer Arch. It does behave significantly different from the default Arch behavior, which would also be of concern to you, if you'd relied on any specific thing, and be it most minute.
Your employer doesn't seem to understand what Linux is or what distinguished distributions from each other.
Hint: It's not the binary compatibility. As a matter of fact, as long as you stay within the boring old realm of boring old glibc + libstdc++ Linux distributions are shockingly compatible with each other. There might be slight differences in where they put libraries other than libc.so, libdl.so and ld-linux[-${arch}].so, but those two usually always can be found under /lib. And once ld-linux[-${arch}].so and libdl.so take over (that means pulling in all libraries loaded at runtime) all the specifics of where shared objects and libraries are to be found are abstracted away by the dynamic linker.
1: like becoming multithreaded only after global constructors were executed and libstdc++ deciding it wants to be singlethreaded, because libpthread wasn't linked into a program that didn't create a single thread on its own. That was a really weird bug I unearthed, but yshui finally understood https://gitlab.freedesktop.org/mesa/mesa/-/issues/3199
You can list the predefined preprocessor macros with
gcc -dM -E - /dev/null
clang -dM -E - /dev/null
None of those indicate what operating system the compiler is running under. So not only you can't tell whether the program is compiled under Arch Linux, you can't even tell whether the program is compiled under Linux. The macros __linux__ and friends indicate that the program is being compiler for Linux. They are defined when cross-compiling from another system to Linux, and not defined when cross-compiling from Linux to another system.
You can artificially make your program more difficult to compile by specifying absolute paths for system headers and relying on non-portable headers (e.g. /usr/include/bits/foo.h). That can make cross-compilation or compilation for anything other than Linux practically impossible without modifying the source code. However, most Linux distributions install headers in the same location, so you're unlikely to pinpoint a specific distribution.
You're very likely asking the wrong question. Instead of asking how to restrict compilation to Arch Linux, start from why you want to restrict compilation to Arch Linux. If the answer is “because the resulting program wouldn't be what I want under another distribution”, then start from there and make sure that the difference results in a compilation error rather than incorrect execution. If the answer to “why” is something else, then you're probably looking for a technical solution to a social problem, and that rarely ends well.
No, it doesn't. And even if it did, it wouldn't stop anyone from compiling the code on an Arch Linux distro and then running it on a different Linux.
If you need to prevent your software from "from running under anything but Arch Linux", you'll need to insert a run-time check. Although, to be honest, I have no idea what that check might consist of, since linux distros are not monolithic products. The actual check would probably have to do with your reasons for imposing the restriction.

cross os build by converting static bulid into os specific binary

Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!
I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).

Is a Linux executable "compatible" with OS X?

If you compile a program in say, C, on a Linux based platform, then port it to use the MacOS libraries, will it work?
Is the core machine-code that comes from a compiler compatible on both Mac and Linux?
The reason I ask this is because both are "UNIX based" so I would think this is true, but I'm not really sure.
No, Linux and Mac OS X binaries are not cross-compatible.
For one thing, Linux executables use a format called ELF.
Mac OS X executables use Mach-O format.
Thus, even if a lot of the libraries ordinarily compile separately on each system, they would not be portable in binary format.
Furthermore, Linux is not actually UNIX-based. It does share a number of common features and tools with UNIX, but a lot of that has to do with computing standards like POSIX.
All this said, people can and do create pretty cool ways to deal with the problem of cross-compatibility.
EDIT:
Finally, to address your point on byte-code: when making a binary, compilers usually generate machine code that is specific to the platform you're developing on. (This isn't always the case, but it usually is.)
In general you can easily port a program across various Unix brands. However you need (at least) to recompile it on each platform.
Executables (binaries) are not usable on several platforms, because an executable is tightly coupled with the operating system's ABI (Application Binary Interface), i.e. the conventions of how an application communicates with the operating system.
For instance if your program prints a string onto the console using the POSIX write call, the ABI specifies:
How a system call is done (Linux used to call the 0x80 software interrupt on x86, now it uses the specific sysenter instruction)
The system call number
How are the function's arguments transmitted to the system
Any kind of alignment
...
And this varies a lot across operating systems.
Note however that in some cases there may be “ABI adapters” allowing to run binaries of one OS onto another OS. For instance Wine allows you to run Windows executables on various Unix flavors, NDISwrapper allows you to use Windows network drivers on Linux.
"bytecode" usually refers to code executed by a virtual machine (e.g. for java or python). C is compiled to machine code, which the CPU can execute directly. Machine language is hardware-specific so it it would be the same under any OS running on an intel chip (even under Windows), but the details of how the machine code is wrapped into an executable file, and how it is integrated with system calls and dynamically linked libraries are different from system to system.
So no, you can't take compiled code and use it in a different OS. (However, there are "cross-compilers" that run on one OS but generate code that will run on another OS).
There is no "core byte-code that comes from a compiler". There is only machine code.
While the same machine instructions may be applicable under several operating systems (as long as they're run on the same hardware), there is much more to a hosted executable than that, and since a compiled and linked native executable for Linux has very different runtime and library requirements from one on BSD or Darwin, you won't be able to run one binary on the other system.
By contrast, Windows binaries can sometimes be executed under Linux, because Linux provides both a binary format loader for Windows's PE format, as well as an extensive API implementation (Wine). In principle this idea can be used on other platforms as well, but I'm not aware of anyone having written this for Linux<->Darwin. If you already have the source code, and it compiles in Linux, then you have a good chance of it also compiling under MacOS (modulo UI components, of course).
Well, maybe... but most probably not.
But if it does, it's not "because both are UNIX" it's because:
Mac computers happen to use the same processor nowadays (this was very different in the past)
You happen to use a program that has no dependency on any library at all (very unlikely)
You happen to use the same runtime libraries
You happen to use a loader/binary format that is compatible with both.

what are the differences between an executable generated by windows and linux [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why an executable program for a specific CPU does not work on Linux and Windows?
Why can't programs written in linux be executd in windows ? Suppose I compile a simple C program containing function calls that are common to both windows and linux, Does the compiler generate different binary under windows and linux ?
They use different container formats.
Most Linux executables are ELF files; all Windows executables and DLLs are PE files.
Here are some of the reasons I can think of off the top of my head:
Different container formats (which so far seems to be the leading differentiator in this answer -- however its not the only reason).
different dynamic linker semantics.
different ABI.
different exception handling mechanisms -- windows has SEH -- upon which C++ exception handling is built
different system call semantics and different system calls -- hence different low-level libraries.
The binary types are different. For example, Linux may use the Executable and Linkable format, while Windows uses Portable Executable format.
But the biggest problems are the API's. A Windows program would call a Windows API to set up it's process, like stack, and allocate memory. Obivously those API calls are not available on other operating systems.
Yes, the executables use different file formats. In both cases, loading an executable to create a process involves a substantial amount of work, and neither (at least directly) includes the code to deal with loading the others binary format. Even if it did, most programs would have substantial problems. Just for example, quite a few Linux programs link against a shared library, so to load them successfully under Windows you'd not only need the loader, but also a copy of something to stand in place of that shared library. In reality, of course, there isn't just one shared library though -- there are dozens. By the time you emulated them all, you'd have a fairly substantial chunk of the OS as a whole ported to Windows.
There is no single function call you can make in both windows and linux that can affect anything out of the process' address space, even if you could get both systems to execute the program. Except maybe:
void f()
{
*((char*)0) = 0;
}

Can I execute any c made prog without any os platform?

I googled about it and somewhere I read ....
Yes, you can. That is happening in the case of embedded systems
I think NO, it's not possible. Any platform must have an operating system. Or else, your program must itself be an OS.
Either soft or hard-wired. Without an operating system your component wouldn't work.
Am I right or can anybody explain me the answer? (I dont have any idea abt embedded systems...)
Of course you can. All a (typical) CPU needs is power and access to a memory, then it will execute its hard-coded boot sequence.
Typically this will involve reading some pre-defined address, interpreting the contents there as instructions, and starting to run them.
These instructions could of course come from a C program, although at this level it's more common to write the very early stages (called bootstrapping) in assembly.
This of course doesn't mean, if I were to read your question title literally, that any C program be run this way. If the program assumes there is an OS, but there isn't, it won't work. This should be pretty obvious.
You can run a program in a system without an Operating System ... and that program need not be an Operating System itself.
Think about all the computers (or processors if you prefer) inside a car: engine management, air conditioning, ABS, ..., ...
All of those system have a program (possibly written in C) running. None of the processors have an OS.
The Standard specifically differentiates between hosted implementations and freestanding implementations:
5.1.2.1 Freestanding environment
1 In a freestanding environment (in which C program execution may take place
without any benefit of an operating system), the name and type of the
function called at program startup are implementation-defined. Any library
facilities available to a freestanding program, other than the minimal set
required by clause 4, are implementation-defined.
2 The effect of program termination in a freestanding environment is
implementation-defined.
5.1.2.2 Hosted environment
1 A hosted environment need not be provided, but shall conform to the
following specifications if present.
...
I think you would have fun writing 'toy' kernels that are designed to run under simulators like QEMU (or virtualization platforms, Xen + MiniOS is one of my favorites). With not (much) difficulty, you could get a basic console up and running and start printing things to it. Its really fun, educational and satisfying all at once.
If you are working on x86 .. and get your spiffy kernel working under QEMU .. there's a very good chance that it will also work on real hardware. You might enjoy it.
Anyway, the answer to your question is most decidedly yes. Its especially easier if you happen to be using a boot loader .. for instance, google memtest86 and grab the code.
Usually, any C program will have a variety of system calls which depend on the operating system. For example, printf makes a system call to write to the screen buffer. Opening files and things like that are also system calls.
So basically, you can run the C code which just gets compiled and assembled in to machine code on a processor, but if the code makes any system calls, it would just freeze up the processor when it tries to jump to a memory location that it thinks is the operating system. This of course would depend on your being able to get the program running in the first place, which is not easy without the operating system as well.
Embedded systems are legitimate OS's in their own right, they're just not general purpose OS's. Any userland program (i.e. a program that is not itself an operating system) needs an operating system to run on top of.
As an example: Building Bare-Metal ARM Systems with GNU
Many embedded systems do not have enough resources for a full OS, some may use a scheduler kernel or RTOS, others are coded 'bare metal'. The main() C entry point is entered after reset. Only a small amount of assembler code is required to initialise a microprocessor, to execute C code. All C requires to run generally is a stack - usually simply a case of initialising the stack pointer to a specific address. Some processor specific initialisation of interrupt/exception vectors, system clocks, memory controllers etc. may be necessary also.
On a desktop PC, typically you have a BIOS that handles basic hardware initialisation such as SDRAM controller setup and timing, and then bootstrapping from a disk boot-sector, which then in turn bootstraps an OS. Any of that code could be written in C (and some of it probably is), and it could do something other than boot an OS - it could do anything - it is just code.
OSs are useful for non-dedicated computing devices where the end user many select one of many programs to execute and possibly several simultaneously. Most embedded systems do just one thing, the software is often loaded from ROM or executes directly from ROM, and is never changed and executes indefinitely (usually stopped only by power-down).
You still of course might implement device drivers and the like, but often these are an integral part of the application rather than a separate entity. Even when you do use an RTOS in an embedded system, it is still generally integral to your application rather than an OS in the sense you might understand. In these cases the RTOS is simply a library like any other, and is often initialised and started from main() rather then the other way around as you might expect.
every piece of hardware has to have a piece of software that operates it, be it embedded firmware (smaller and relatively fixed, like vxworks) or an operating system software that can run complex arbitrary code on top of it (like windows, linux, or mac).
think of it as a stack. at the bottom, you have the hardware. on top of that, a piece of software that can control that hardware. on top of that, you can have all sorts of stuff. in the case of a voip phone, you'll have vxworks controlling the hardware, and a layer on top of that that handles all the phone applications.
so going back to your question, yes, you CAN run any c program on anything, BUT it depends what kind of c program it is. if it's a low level c program that can talk to hardware, then you dont need anything other than your program and the hardware. if it's a higher level c program (like a chat program), then you need a whole bunch of stuff between your program and the hardware.
make sense?
Obviously, you cannot execute any arbitrary C program without some sort of OS or OS-equivalent. Similarly, I can write a C program under Linux that won't run under Microsoft Windows.
However, you can write C programs on almost anything. It's a popular language to write software for embedded systems in, and they very often don't have an OS.
Many embedded systems have just a CPU hooked up to a ROM, with pins coming out of the chip that are directly attached to inputs and outputs. There is no user I/O, no file system, no process scheduling, nothing you'd typically want an OS for. In those cases, a C programmer might write a program that is burned into a ROM, which will handle everything itself.
(Some embedded systems are more complicated, and can use an OS. Linux is frequently used, since it's free for the use, can be made very compact, and can be changed at any level. Not all do, though.)
You definitely don't need an OS to run your C code on any system. What you will need is two pieces of initialization code - one to initialize the hardware needed (processor, clock, memory) and another to set up your stack and C runtime (i.e. intialization of data and BSS sections). This, of course, means that you cannot take advantage of the multithreading, messaging and synchronization services that an OS would provide. I'll try and break it down into some steps to give you an idea:
Write a "reset_routine" that run when the board starts. This will initialize the clock and any external memory needed. (This routine will have to execute from a memory that is either internal or one that can be initialized and programmed externally).
The reset_routine, after the hardware initializations, transfers control to a "sw_runtime_init" routine that will set up the stack and the globals definied by you application. (Do a jump from reset_routine to sw_runtime_init instead of a call to avoid stack usage).
Compile and link this to you application, whilst ensuring that the "reset_routine" is linked to the location where the reset vector points to.
Load this onto your target and pray.

Resources