Related
Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
I need to check that my software restricts itself from compiling under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
Does anyone know how to guarantee through GCC preprocessor directives that my binaries are only compilable under Arch Linux?
Of course I can always
#ifdef __linux__
...
#endif
But this is not precise enough.
Edit: This must be done through C source code and not by any building systems, so, for example, doing this through CMake is completely discarded.
Edit 2: Users faking this behaviour is not a problem since the software is distributed to selected clients and thus, actively trying to "misuse" our source code is "their decision".
Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
No. Because there's nothing inherently specific to Arch Linux on the binary level. For what it's worth, when compiling the only things you/the compiler has to care about is the target architecture (i.e. what kind of CPU it's going to run with), data type sizes and alignments and function calling conventions.
Then later on, when it's time to link the compiled translation unit objects into the final binary executable, the runtime libraries around are also of concern. Without taking special precautions you're essentially locking yourself into the specific brand of runtime libraries (glibc vs. e.g. musl; libstdc++ vs. libc++) pulled by the linker.
One can easily sidestep the later problem by linking statically, but that limits the range of system and midlevel APIs available to the program. For example on Linux a purely naively statically linked program wouldn't be able to use graphics acceleration APIs like OpenGL-3.x or Vulkan, since those rely on loading components of the GPU drivers into the process. You can however still use X11 and indirect GLX OpenGL, since those work using wire protocols going over sockets, which are implemented using direct syscalls to the kernel.
And these kernel syscalls are exactly the same on the binary level for each and every Linux kernel of every distribution out there. Although inside of the kernel there's a lot of leeway when it comes to redefining interfaces, when it comes to the interfaces toward the userland (i.e. regular programs) there's this holy, dogmatic, ironclad rule that YOU NEVER BREAK USERLAND! Kernel developers breaking this rule, intentionally or not are chewed out publicly by Linus Torvalds in his in-/famous rants.
The bottom line to this is, that there is no such thing as a "Linux distribution specific identifier on the binary level". At the end of the day, a Linux distribution is just that: A distribution of stuff. That means someone or more decided on a set of files that make up a working Linux system, wrap it up somehow and slap a name on it. That's it. There's nothing inherently specific to "Arch" Linux other than it's called "Arch" and (for the time being) relies on the pacman package manager. That's it. Everything else about "Arch", or any other Linux distribution, is just a matter of happenstance.
If you really want to sort different Linux distributions into certain bins regarding binary compatibility, then you'd have to pigeonhole the combinations of
Minimum required set of supported syscalls. This translates into minimum required kernel version.
What libc variant is being used; and potentially which version, although it's perfectly possible to link against a minimally supported set of functions, that has been around for almost "forever".
What variant of the C++ standard library the distribution decided upon. This actually also inflicts programs that might appear to be purely C, because certain system level libraries (*cough* Mesa *cough*) will internally pull a lot of C++ infrastructure (even compilers), also triggering other "fun" problems¹
I need to check that my software restricts itself from running under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
You couldn't find resources on the Internet, because there's nothing specific on the binary level that makes "Arch" Arch. For what it's worth right now, this instant I could create a fork of Arch, change out its choice of default XDG skeleton – so that by default user directories are populated with subdirs called leech, flicks, beats, pics – and call it "l33tz" Linux. For all intents and purposes it's no longer Arch. It does behave significantly different from the default Arch behavior, which would also be of concern to you, if you'd relied on any specific thing, and be it most minute.
Your employer doesn't seem to understand what Linux is or what distinguished distributions from each other.
Hint: It's not the binary compatibility. As a matter of fact, as long as you stay within the boring old realm of boring old glibc + libstdc++ Linux distributions are shockingly compatible with each other. There might be slight differences in where they put libraries other than libc.so, libdl.so and ld-linux[-${arch}].so, but those two usually always can be found under /lib. And once ld-linux[-${arch}].so and libdl.so take over (that means pulling in all libraries loaded at runtime) all the specifics of where shared objects and libraries are to be found are abstracted away by the dynamic linker.
1: like becoming multithreaded only after global constructors were executed and libstdc++ deciding it wants to be singlethreaded, because libpthread wasn't linked into a program that didn't create a single thread on its own. That was a really weird bug I unearthed, but yshui finally understood https://gitlab.freedesktop.org/mesa/mesa/-/issues/3199
You can list the predefined preprocessor macros with
gcc -dM -E - /dev/null
clang -dM -E - /dev/null
None of those indicate what operating system the compiler is running under. So not only you can't tell whether the program is compiled under Arch Linux, you can't even tell whether the program is compiled under Linux. The macros __linux__ and friends indicate that the program is being compiler for Linux. They are defined when cross-compiling from another system to Linux, and not defined when cross-compiling from Linux to another system.
You can artificially make your program more difficult to compile by specifying absolute paths for system headers and relying on non-portable headers (e.g. /usr/include/bits/foo.h). That can make cross-compilation or compilation for anything other than Linux practically impossible without modifying the source code. However, most Linux distributions install headers in the same location, so you're unlikely to pinpoint a specific distribution.
You're very likely asking the wrong question. Instead of asking how to restrict compilation to Arch Linux, start from why you want to restrict compilation to Arch Linux. If the answer is “because the resulting program wouldn't be what I want under another distribution”, then start from there and make sure that the difference results in a compilation error rather than incorrect execution. If the answer to “why” is something else, then you're probably looking for a technical solution to a social problem, and that rarely ends well.
No, it doesn't. And even if it did, it wouldn't stop anyone from compiling the code on an Arch Linux distro and then running it on a different Linux.
If you need to prevent your software from "from running under anything but Arch Linux", you'll need to insert a run-time check. Although, to be honest, I have no idea what that check might consist of, since linux distros are not monolithic products. The actual check would probably have to do with your reasons for imposing the restriction.
I am curious about how statically linked C executables would work in different environments. Lets say we compile our C code to target x86 MacOs and we statically include everything it uses in the executable as well (print, strlen). What really stops this executable from running in a Windows OS if we include every library it needs? I understand the file format could be different and break but other than that would this technically be able to run?
I see where you're coming from, operating systems makes us think as programmers that libraries are the be-all end-all of programming, that a call to a library is all you need to make complex things happen and that everything is contained in them.
But the truth is, libraries mostly provide provide an abstraction layer. As an exemple let's create a library called "hello_world.so" which prints "Hello World!" to the console. That library we created relies on stdio to handle the complex I/O stuff but stdio itself depends on at least one other thing: the kernel (some specific targets work without a kernel but these system are outside the scope of this answer).
In the desktop world, things can get really complicated, we have several hundreds of processes all running at once even in an idle system, all these apps need access to the hardware (possibly at once too) so it was decided a controller was needed, some piece of software that would coordinate all other software running on the same computer. This piece of software is usually called a kernel. On Windows it's the NT kernel, on macOS it's the XNU and on Linux it's... the Linux kernel!
On these systems, the biggest job of a library is to abstract the kernel, to make us believe printing text on a Linux or a Windows console works the exact same way when actually it can be completely different! Libraries like stdio/time/etc have different "implementations" but the same "interface": they look the same from the dev point of view but the way they achieve their goals can vary wildy, they can do conversions, calls to other hidden or non hidden functions... All this is completely portable from one OS to the other though, things start to go south for you idea when kernel calls start to show up.
Kernel calls are ways a program can "talk" to the kernel. They can be used to do literally thousands of different things but for example there's one (or several ones) to ask for memory (usually this is called with malloc), one to print to the console, one to ask if a network packet arrived, on to ask to talk to your GPU... And these system calls are completely different from one kernel to the other, sometimes even for two versions of the same kernel!
These "kernel calls" are the only thing preventing you from running statically-compiled linux programs on Windows.
PS: Even though all the above is completely true and kernels can be as different from one another as they wish, due to the history of kernels and of computing in general, some kernels actually share the same interface (even though their implementation as you guessed, can be nothing alike). The best example I know of is how most kernels I know of are based on the UNIX kernel.
It means that -even though I have never tested it myself- I think you should be able to statically link a Linux app and use it on Linux, most BSDs and possibly even macOS
The binary and libraries are specific to the operating system.
The TLDR is that the linking process translates function calls into adresses that points of the Operating System's specific libraries. There are some differences like alignment that happens at compile time, but the responsible for your x86 instructions not running under a different OS is the linker.
Your compiler produces x86 instructions that are ready to execute but is incomplete. The linker will go into every function call give a adress for that function in the executable file, even for functions of the standard library.
The linker will create the executable file following a file format which have a header with information like size, metadata and entry point.Windows and Unix have differents specifications for executable files. Windows has PE and Unix has ELF format both for executables and libraries.
Through some hacks and non-trivial tricks it is possible to create an executable that can run on Windows and Unix (see αcτµαlly pδrταblε εxεcµταblε for how).
But even if you do all that there is an obstacle that can not be circumvented: the kernel. The kernel is, well, a kernel. It's the most important thing in
a OS and it provides a set of API calls that provides basic and low-level access to computer resources, so functions like malloc are implemented using the kernel specific API call, VirtualAlloc for Windows and vmalloc or mmap for POSIX.
Main Answer
If your program does anything useful (print output, return a value, communicate on the network), it contains some form of system-call instruction. Each system-call instruction is a request to a particular operating system, and macOS system calls will not work on Windows and vice-versa.
The system-call instruction sends information to the operating system, including a number identifying which service is requested. The operating system that performs that service. When you build your program for macOS, it includes library routines that contain system-call instructions. If you execute those instructions on a Microsoft Windows system, Windows will not understand the macOS requests. It will interpret the information differently, and the program will not work.
So, in theory, there is nothing preventing you from writing your own program loader that reads an ELF executable file intended for macOS, loading its contents into memory, and transferring control to its entry point. But the program will not work because of the system calls.
Supplement
You might consider translating all the system calls in the program. Changing the primary number that identifies the service request might be feasible; it might not require changing the executable too much. For example, if 37 is a “write to file” request on macOS, your program loader might change it to 48 on Windows. However, the system calls also require other data be passed, such as pointers to buffers, lengths, and so on, and there are likely many discrepancies in how those are passed, so that macOS requests cannot be easily translated into Windows requests. Also, it can be technically challenging to identify all places in a program that a certain instruction is used—some of the contents of memory of a loaded program are instructions and some are data. Most normal programs may be well-behaved and easy to analyze in this regard, but not all are.
Another potential issue is that programs may expect to have certain modes set in the processor, and the host operating system may or may not have set those modes as needed.
OK, so I am trying to connect an emulated (Through TSIM) LEON3 processor to a UART terminal. If I am not mistaken I believe I need to compile a C program to enable it to talk with a terminal as I am having difficulties doing it another way.
I found some source code for UART communication here and it all seems to be OK.
However, I am having issues compiling it using the SPARC Bare C Toolchain in Eclipse as it is saying that the windows.h file does not exist. Now I know it exists as I've compiled it successfully using the GCC Toolchain and can't find any similar cases on the web as to why this would be happening.
Is there anyone out there who has had a similar problem or knows the solution?
Additionally, if you know me to be doing the wrong thing in regards to the LEON3 UART comms, please let me know and I will just leave.
Thanks.
BCC is a cross compiler targeting standalone, LEON3- and LEON4-based environments. As a cross compiler, its job is to build binaries for a different environment than the one in which it runs.
Relevant header files describe functions available to a program in its runtime (target) environment. Build-environment libraries and their headers are irrelevant when cross compiling because the build and target environments differ. BCC is correct to expose only the headers for the environment for which it compiles, and that environment does not provide Windows API functions. If the code you're trying to build depends on the Windows API, then you'll need to modify it to remove that dependency, or else find something different.
On the other hand, I strongly suspect that you're going about this whole thing the wrong way. In particular, when you say,
I believe I need to compile a C program to enable it to talk with a terminal
it sounds like you think you're going to build some kind of helper program, but if that's your idea then either you're building it for the wrong environment or you have the wrong idea altogether.
If you want a Windows program that talks to the emulated machine, then you should be building that as a Windows program, and BCC doesn't do that. In that case, you should be using MinGW's gcc, or another C compiler for the emulator's host environment. Moreover, the host-side interface to the emulated environment's UART is an aspect of the emulator. I haven't a clue what emulator you're using, but it might not present the host (Windows) side of that interface as a UART, and it might not require using the Windows API at all.
Or if you indeed do intend to build a program for the standalone LEON3 target environment, then you need to understand that when it runs, it will be the only program that will be running in that environment. That's what "standalone" means -- no OS underneath, therefore no separate processes, and often not even multiple threads of execution. Thus, you do not need a helper program; you just need a program.
The BCC documentation talks about the libraries available there, and in particular, it describes how in that environment, file I/O is allowed only on the standard input and output streams, which are mapped to UART A. Thus, if you use BCC to build the program to run in the emulator, then you don't need to do anything special on that end to talk to the UART. You just use stdio functions directed at stdin and stdout.
On the third hand, if you are running an actual operating system in your emulated environment, then to build programs that run on it you should be using either a native compiler for that environment, in that environment, or else a cross compiler targeting that hosted environment. Either way, BCC is not such a compiler, but GCC might be. Anyway, since Windows does not run on LEON3, it's safe to say that if this is what you're trying to do then you still need something that does not depend on the Windows API.
What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
For example, OpenGL is a library which is OS independent. Functions it contain must be assuming a specific processor. But ain't codes/programs/applications OS-specific?
What I learned is that:
OS is processor-specific.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Source code is plain text.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
OpenGL is a library.
Therefore, OpenGL has to be OS/processor-specific. How can it be OS-independent?
What can be OS independent is the source code. Is this correct?
How does it help to know if a source code is OS-independent or not?
What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
When a program uses only defined behaviour (no undefined, unspecified or implementation defined behaviours), then the program is guarenteed by the lanugage standard (in your case C language standard) to compile (using a standards compliant compiler) and run uniformly on all operating systems.
Basically you've to understand that a language standard like C or a library standard like OpenGL gives a set of minimum assumable guarentees that a programmer can make and build upon. These won't change as long as the compiler is compliant with the standard (in case of a library, the implementation is standards-compilant) and the program is not treading in undefined behaviour land.
openGL has to be OS/processor specific. How can it be OS-independent?
No. OpenGL is platform-independant. An OpenGL implementation (driver which implements the calls) is definitely platform and GPU-specific. Say C standard is implemented by GCC, MSVC++, etc. which are all different compiler implementations which can compile C code.
what can be OS independent is the source code. Is this correct?
Source code (if written for with portability in mind) is just one amongst many such platform-independant entities. Libraries (OpenGL, etc.), frameworks (.NET, etc.), etc. can be platform-independant too. For that matter even hardware can be spec'd by some one and implemented by someone else: ARM processors are standards/specifications charted out by ARM and implemented by OEMs like Qualcomm, TI, etc.
do we mean that it can run on any OS as long as the processor is same?
Both processor and the platform (OS) doesn't matter as long as you use only cross-platform components for building your program. Say you use C, a portable language; SDL, a cross-platform library for creating windows, handling events, framebuffers, etc.; OpenGL, a cross-platform graphics library. Now your program will run on multiple platforms, even then it depends on the weakest link. If SDL doesn't run on some J2ME-only phone then it'll not have a library distribution for that platform and thus you application won't run on that platform; so in a sense nothing is all independant. So it's wise to play around with the various libraries available for different architectures, platforms, compilers, etc. and then pick the required ones based on the platforms you're targetting.
What exactly do we mean when we say that a program is OS-independent?
It means that it has been written in a way, that it can be compiled (if compilation is necessary for the language used) or run without or just little modification on several operating systems and/or processor architectures.
For example, openGL is a library which is OS independent.
OpenGL is not a library. OpenGL is an API specification, i.e. a lengthy volume of text that describes a set of tokens (= named numeric values) and entry points (= callable functions) and the effects they have on the system level.
What I learned is that:
OS is processor-specific.
Wrong!
Just like a program can be written in a way that it can targeted to several operating systems (and processor architectures), operating systems can be written in a way, that they can be compiled for and run on several processor architecture.
Linux for example supports so many architectures, that it's jokingly said, that it runs on everything that is capable of processing zeroes and ones and has a memory management unit.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Wrong!
Program logic is independent from the OS. A calculation like x_square = x * x doesn't depend on the OS at all. Only a very small portion of a program, namely those parts that make use of operating system services actually depend on the OS. Such services are things like opening, reading and writing to files, creating windows, stuff like that. But you normally don't use those OS specific APIs directly.
Most OS low level APIs have certain specifics which a easy to trip over and arcane to address. So you don't use them, but some standard, OS independent library that hides the OS specific stuff.
For example the C language (which is already pretty low level) defines a standard set of functions for file access, the stdio functions. fopen, fread, fwrite, fclose, … Similar does C++ with its iostreams But those just wrap the OS specific APIs.
source code is plain text.
Usually it is, but not necessarily. There are also graphical, data flow programming environments, like LabVIEW, which can create native code as well. The source code those use is not plain text, but a diagram, which is stored in a custom binary format.
Compiler ( a program ) is OS specific, but it can compile a source code for a different processor assuming the same OS.
Wrong! and Wrong!
A compiler is language and target specific. But its perfectly possible to have a compiler on your system that generates executables targeted for a different processor architecture and operating system than the system you're using it on (cross compilation). After all a compiler is "just" a (mathematical) function mapping from source code to target binary.
In fact the compiler itself doesn't target an operating system at all, it only targets a processor architecture. The whole operating system specifics are introduced by the ABI (application binary interface) of the OS, which are addresses by the linked runtime environment and that target linker (yes, the linker must be able to address a specific OS).
openGL is a library.
Wrong!
OpenGL is a API specification.
Therefore, openGL has to be OS/processor specific.
Wrong!
And even if OpenGL was a library: Libraries can be written to be portable as well.
How can it be OS-independent?
Because OpenGL itself is just a lengthy document of text describing the API. Then each operating system with OpenGL support will implement that API conforming to the specification, so that a program written or compiled to run on said OS can use OpenGL as specified.
what can be OS independent is the source code.
Wrong!
It's perfectly possible to write a program source code in a way that it will only compile and run for a specific operating system and/or for a specific processor architecture. Pinnacle of OS / architecture dependence: Writing things in assembler and using OS specific low level APIs directly.
How does it help to know if a source code is OS/window independent or not?
It gives you a ballpark figure of how hard it will be to target the program to a different operating system.
A very important thing to understand:
OS independence does not mean, a programm will run on all operating systems or architectures. It means that it is not tethered to a specific OS/CPU combination and porting to a different OS/CPU requires only little effort.
There's a couple concepts here. A program can be OS-independent, that is it can run/compile without changes on a range of OS's. Secondly libraries can be made on a range of OS's which can be used by a platform independent program.
Strictly OpenGL doesn't have to be OS-independent. OpenGL may actually have different source code on different OS's which interface with drivers in a platform specific way. What matters is that OpenGL's interface is OS-independent. Because the interface is OS-independent it can be used by code which is actually OS-independent and can be run/compiled without modification.
Libraries abstracting out OS-specific things is a wonderful way to allow your code to interface with the OS which normally would require OS-specific code.
One of those:
It compiles on any OS supported by program framework without changes to source code. (languages like C++ that compile directly into machine code)
The program is written in interpreted language or in language that compiles into platform-independent bytecode, and can actually run on whatever platform its interpreter supports without modifications. (languages like java or python).
Application relies on cross-platform framework of some kind that abstract operating-system-specific calls away. It will run without modifications on any OS supported by framework.
Because you haven't added any language tag, it is either #1, #2 or #3, depending on your language.
--edit--
OS is processor-specific.
No. See Linux. Same code base, can be compiled for different architectures. Normally, (well, it is reasonable to expect that) OS kernel is written in portable language (like C) that can be rebuild for different CPU. On distribution like gentoo, you can rebuild entire OS from source as well.
Applications (programs/codes/routines/functions/libraries) are OS specific.
No, Applications like java *.jar files can be made more or less OS independent - as long as there is interpreter, they'll run anywhere. There will be some OS-specific part (like java runtime environment in case of java), but your program will run anywhere where this part is present.
Source code is plain text.
Not necessarily, although it is true in most cases.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
Not quite. It is reasonable to be written using (somewhat) portable code so compiler can be rebuilt for different OS.
While running on OS A it is possible (in some cases) to compile code for os B. On Linux you can compile code for windows platform.
OpenGL is a library.
It is not. It is a specification (API) that describes set of programming functions for working with 3d graphics. There are Libraries that implement this specifications. Specification itself is not a library.
Therefore, OpenGL has to be OS/processor-specific.
Incorrect conclusion.
How can it be OS-independent?
As long as underlying platform has standard-compliant OpenGL implementation, rendering part of your program will work in the same way as on any other platform with standard-compliant OpenGL implementation. That's portability. Of course, this is an ideal situation, in reality you might run into driver bug or something.
If you compile a program in say, C, on a Linux based platform, then port it to use the MacOS libraries, will it work?
Is the core machine-code that comes from a compiler compatible on both Mac and Linux?
The reason I ask this is because both are "UNIX based" so I would think this is true, but I'm not really sure.
No, Linux and Mac OS X binaries are not cross-compatible.
For one thing, Linux executables use a format called ELF.
Mac OS X executables use Mach-O format.
Thus, even if a lot of the libraries ordinarily compile separately on each system, they would not be portable in binary format.
Furthermore, Linux is not actually UNIX-based. It does share a number of common features and tools with UNIX, but a lot of that has to do with computing standards like POSIX.
All this said, people can and do create pretty cool ways to deal with the problem of cross-compatibility.
EDIT:
Finally, to address your point on byte-code: when making a binary, compilers usually generate machine code that is specific to the platform you're developing on. (This isn't always the case, but it usually is.)
In general you can easily port a program across various Unix brands. However you need (at least) to recompile it on each platform.
Executables (binaries) are not usable on several platforms, because an executable is tightly coupled with the operating system's ABI (Application Binary Interface), i.e. the conventions of how an application communicates with the operating system.
For instance if your program prints a string onto the console using the POSIX write call, the ABI specifies:
How a system call is done (Linux used to call the 0x80 software interrupt on x86, now it uses the specific sysenter instruction)
The system call number
How are the function's arguments transmitted to the system
Any kind of alignment
...
And this varies a lot across operating systems.
Note however that in some cases there may be “ABI adapters” allowing to run binaries of one OS onto another OS. For instance Wine allows you to run Windows executables on various Unix flavors, NDISwrapper allows you to use Windows network drivers on Linux.
"bytecode" usually refers to code executed by a virtual machine (e.g. for java or python). C is compiled to machine code, which the CPU can execute directly. Machine language is hardware-specific so it it would be the same under any OS running on an intel chip (even under Windows), but the details of how the machine code is wrapped into an executable file, and how it is integrated with system calls and dynamically linked libraries are different from system to system.
So no, you can't take compiled code and use it in a different OS. (However, there are "cross-compilers" that run on one OS but generate code that will run on another OS).
There is no "core byte-code that comes from a compiler". There is only machine code.
While the same machine instructions may be applicable under several operating systems (as long as they're run on the same hardware), there is much more to a hosted executable than that, and since a compiled and linked native executable for Linux has very different runtime and library requirements from one on BSD or Darwin, you won't be able to run one binary on the other system.
By contrast, Windows binaries can sometimes be executed under Linux, because Linux provides both a binary format loader for Windows's PE format, as well as an extensive API implementation (Wine). In principle this idea can be used on other platforms as well, but I'm not aware of anyone having written this for Linux<->Darwin. If you already have the source code, and it compiles in Linux, then you have a good chance of it also compiling under MacOS (modulo UI components, of course).
Well, maybe... but most probably not.
But if it does, it's not "because both are UNIX" it's because:
Mac computers happen to use the same processor nowadays (this was very different in the past)
You happen to use a program that has no dependency on any library at all (very unlikely)
You happen to use the same runtime libraries
You happen to use a loader/binary format that is compatible with both.