Compilers and Instruction sets - c

"C is a genereal purpose language, not tied to a particular system"
The C programming Language, BRIAN W KERNIGHAN & DENNIS M. RITCHIE
Yet with the right compiler we can make a .exe which runs on every Windows machine, which in turn means on every CPU Windows runs on.
So my question is: does every x86-64 CPU (Intel or AMD) use the same instruction set ? (yes, I could make a comparison...) if not, then I'll have to assume that the compiler detects what CPU we're running and uses the right instruction set during compile time.
Am I totally mistaken ?
I barely know what I'm talking about so please bear with me.
Just a dude trying to look under the hood.
Thank you

Intel makes many different processor models that share a core instruction set of the “x86-64” family (and additional processor models that do not). Even among the processors with the shared core instructions, there are variations. Newer models may have instructions that older models did not, and some parts of the instruction set may be on certain models and not others.
Some instructions even behave differently on different processors.
When you compile a program, the compiler “targets” a particular combination of instruction subsets. This means the instructions in those subsets are available for the compiler to use when it is generating code. The compiler might or might not use any particular instruction or subset depending on its needs or choices when compiling a particular program. The resulting program is then suitable for processor models with the targeted instructions and not for other models (unless the compiler happened not to use any of the instructions not on those models, even though it could have).
Often, the default setting for the compiler‘s target is either a processor model like the one you are running on or some typical selection of instruction subsets that is common for modern processor models. The target may also be selected based on other settings you give the compiler, such as asking it to target a particular version of an operating system. However, you can pass the compiler switches to tell it to compile for entirely different targets, even for entirely different architectures, such as compiling for an ARM processor while running on an Intel processor.
Software is also part of a computer system, so the executable file the compiler produces may also depend on certain software libraries being available at run-time or certain operating system features being available.

Related

check CPU model to execute a specific C code [duplicate]

This question already has an answer here:
How to tell if program is running on x86/x64 or ARM Linux platforms
(1 answer)
Closed 4 years ago.
I want to create a C code that somehow contains two separated blocks. I want to use a function or a tool that extracts the CPU model, and based on that, the program decides which block of code it executes. I only have the idea and I don't know how to implement it !
The first block of code will be executed on an Intel i7 and the second should be executed on ARM Cortex A53.
PS : I am a beginner and I have nothing to do with hardware and similar stuff. Thank you for your help :)
As clearly pointed out, first off you cant have a C program that runs to a point to determine ARM from x86 as that code has to already be ARM or x86. These are different instruction sets. You can use say python or JAVA or some other scripty/virtual machine language. But then you have a COMPILE time decision to build for one target or the other, at that point you already know which target as you are actually running code on it, so if this is strictly ARM vs X86 there is no reason to check runtime. Thats not to say that each architecture and/or system will have a way to check the architecture and flavor you are on ARMv6 vs ARMv7, for example, but not necessarily ARMv7 32 bit vs ARMv8 64 bit although you technically can run aarch32 and aarch64 instruction sets on most ARMv8s just not intermixed, have to have the os or execution level changes yourself to switch between them.
You do understand there are different incompatible instruction sets, specifically the ones you described and C code is compiled to one or the other. So you cannot have a program in C compiled for a target that can detect the other target. You have already selected the target before you get to this point. Now there are emulators, but they tend to target one architecture as well. There are/were products from specific vendors that would emulate one instruction set and convert it runtime to the other, over time as you re-run that code it continues to convert it. You could try that, but you still have to be running code for the right target on the right logic/emulator, and then have a now special detection that is not the norm to find the true underlaying architecture, not the faked emulator.
I suspect you are thinking you can have one architecture specific module that detects the architecture to run architecture specific code. This does not work with C in general, does not make sense to try, thus there probably isnt a good tool for this. In particular since the solution for such a thing is either you build this into the binary file format and the operating system picks because it knows, or you wrap your binary with a target independent language like Python or JAVA or scripty language like perl, bash, etc. that can independent of target determine the architecture (in that case solutions vary widely specific to operating system and language for starters) and then choose which binary to run.
There are many ways to achieve what you want. To check which model is present you first have to read which model you have. how to do that varies between Windows and Linux. i found this SO-topic helpful and it might also be a good start for your research: How to check CPU name, model, speed on Windows/Linux C?

is Program compiled by amd64 compiler executable and possible to run,work properly in x86 cpu?

is Program compiled by amd64 compiler executable and possible to run,work properly in x86 cpu??
I wanna know whether it's possible
and also im trying to develop some program in Qt
but I'm wondering at that why there is no qmake.exe that supports MSVC2017 32bit compiler
No. But a program written without reference to specific architecture dependent features (i.e anything written using standard c, c++, etc) can be compiled using different flags for different target architectures.
https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html
If you are interested in why, looking at the spec for x86 or x86-64 will give you a sense of the answer. An architecture specification is alot more than a list of supported machine instruction. They have different memory architecure, different flags, different cpu modes, etc. And in addition to all this, specifications have hardware specific implementations (chips support different features). When you compile a executable binary, all of these differences must be taken into account.

Why was C not made a platform independent language?

I recently read the dragon book of compiler design. It mentions that the compiler has intermediate code generation as one of its phases which produces a machine independent code. Then why was C not developed as a platform independent language like java?
What the Dragon Book is describing is the following process:
Compile the source code into an intermediate machine-independent byte code format
Perform optimizations and analyses on that IR
Translate the IR to the target platform's actual machine code
The upside of this is that if you want to support additional systems, you just need to add a new code generator for step 3 without having to touch steps 1 and 2.
All common C compilers work this way. So if your question is "Why don't C compilers do what the Dragon Book describes?", the answer is: "They do".
Now you mentioned Java. What a Java compiler does is the following:
Compile the Java code into Java byte code. As far as the Java compiler is concerned, this is not an intermediate format, but the actual target language.
The end
Now to run this byte code you need a JVM, which interprets the byte code and/or JIT-compiles it. The optimizations and analyses usually happen during JIT-compilation. This is not the process described in the Dragon Book.
From the language implementers' point of view, this doesn't change the effort of supporting a new target system very much. You no longer have to change the compiler, but instead you have to change the JVM: Instead of having to add a new backend to the javac compiler, you instead add a new backend to the JIT-compiler. The effort remains basically the same.
The major difference is for the Java programmers: Instead of compiling the program for every target platform and distributing packages for each platform, you can now compile the code once and give the resulting package to everyone. Now the people running your code need to install an JVM to be able to use the package, so you basically moved the effort from the programmer to the end user, but installing a JVM is something you need to do only once (not for every Java program you want to run).
So instead of "write once, compile everywhere", you now have "compile once, run everywhere".
So why didn't C do the same thing that Java does? Performance. Interpreting byte code is slow (compared to running compiled code) and JIT-compilation leads to increased start-up time.
C was initially designed for a particular use case, which involved a specific machine. Although it was loosely based on the language BCPL, which was implemented by way of a platform-independent virtual machine, the goal for C was to be able to write low-level code, such as an operating system, which meant that it needed to be able to take advantage of specific features of the target machine, particularly its ability to directly address individual bytes. By contrast, BCPL's underlying architecture is resolutely word-oriented.
The fact that Bell Labs was able to rapidly reimplement the Unix Operating System in their new language (C) certainly contributed ti its popularity. (At least, that's why I initially learned it.) To allow for a wider dissemination of the language, a version of the compiler was written more closely following the architecture outlined in the Dragon Book, with an initial generation of virtual machine code which is then used to produce code for a target machine. This Portable C Compiler was for many years a reference implementation, and continues to be available.
Other languages contemporary with C, notably Pascal, also used the tactic of targetting a platform independent vurtual machine, and it was once common to refer to virtual machine code as "P-Code" because that's what Niklaus Wirth's Pascal project called their target architecture.
Although GCC does not use a virtual machine as such, it does start by generating a liw-level machine-independent internal representation, simplifying the task of porting the compiler to new archutectures. And of course the Clang compiler produces LLVM (low-level virtual machine) code, which can be transpiled into various concrete machine codes, or interpreted directly.
C was originally designed and written as a "Write-Once, Compile-Anywhere" language, which was as close as they could get at the time to a Universal Language.
Processors and Architectures were so radically different, and resources were so small that the idea of a Universal Virtual Machine (like Java has) was just impossible.
The idea that a single code-base could be run through a compiler, and then you have the same software on any target platform was pretty incredible.
The short answer: Because it was not feasible at that time.
The long answer: the Java platform is a language + virtual machine, Java code compiles to a something called ByteCode, then the virtual machine can take this byte code (it is similar to assembly language) and translates it to the relevant command at runtime, meaning the machine instruction that will be understood by the local machine.
Every architecture has it's own instruction set, meaning that an ARM architecture will not be able to understand code compiled for x86 architecture for example.
in C, the c code is compiled directly to machine instructions, these instructions are then performed by the local machine.
to get a behaviour like Java, you will need to have some kind of interpretor that reads C and translates it to machine code at runtime, this is no cheap task and was way too much for the computers of the time (c was invented in 1972) of course another way this could be implemented is to have the user compile your program before using it, which could be nice but probably will involve making your source code visible to the client, which is unwanted.
hopefully that clarifies things a bit.
Aside from leaving a number of things implementation-defined (in practice this is largely platform/ABI-defined, but strictly speaking doesn't have to be), C is mostly a platform-independent language. Indeed there are implementations of C (such as emscripten) that produce output in a form that can run on any machine platform with the right runtime environment for it. If software written in C makes assumptions about the implementation-defined (or worse, undefined) aspects of the language, then it might fail to work on some implementations/machines, but quite often the cause is more a matter of API/environment/library assumptions (like assuming POSIX, or Windows, or glibcisms) than making nonportable assumptions about the language itself.

Operating Systems: Compiler Confusion

I was posed the question by a classmate asking since an OS is an extended or virtual machine, does the compiler need to know the number of registers, or instructions of the processor when it generates assembly code of a C program.
I've spent a while scouring the internet and here is what I think...
It doesn't need to know the number of registers because being a virtual machine it has unlimited resources in memory per say.
However, it does need to know the instructions of the processor to know when it is able to perform specific functions at specific times.
I was wondering if someone could clarify this for me because I'm not very confident in my answers.
In practice, the compiler is compiling (into object code, often via some assembler file) not only for a target processor (in particular instruction set architecture - ISA), but for a target application binary interface - ABI, which defines some conventions regarding register usage (and how to make system calls) & calling conventions.
An Operating system (provided by the kernel) is - or gives to application programs and processes - a virtual machine very close to the processor; the VM is the (user-mode, unpriviledged) machine instructions + an instruction (SYSENTER) to switch into kernel or supervisor mode for system calls.
See also this & that. Regarding compilers, read about register allocation, instruction scheduling, optimizing compilers.
If you have GCC on your computer, try compiling a hello-world program (perhaps in a fresh directory) with gcc -fverbose-asm -O -S hello.c then look into the generated assembler code hello.s; add -fdump-tree-gimple and look into additional compiler dump file[s] (and even more of them with -fdump-tree-all)
PS. Some compilers compile to machine code in memory (e.g. SBCL). Read also about JIT compilers. Other compilers compile to C code.
Compilation have several stages, from different abstractions to the target machine and this depend on the compiler architecture.
In some stages, registers are not very limited, but at some stages later a mapping is done. You can read about register allocation for more details. I can also suggest you to have a look at Appel's book about compilers architecture.

Do Intel and AMD processor have the same assembler?

The C language was used to write UNIX to achieve portability -- the same C language program compiled using different compilers produces different machine instructions. How come Windows OS is able to run on both Intel and AMD processors?
AMD and Intel processors(*) have a large set of instructions in common, so it is possible for a compiler or assembler to write binary code which runs "the same" on both.
However, different processor families even from one manufacturer have their own sets of instructions, usually referred to as "extensions" or whatever. Ignoring the x87 co-processor, the first time I remember this being a marketing point was when everything suddenly went "with MMX(TM) technology". Binary code expected to run on any processor either needs to avoid extensions, or to detect the CPU type before using them.
Intel's Itanium 64-bit architecture was completely different from AMD's x86-64 architecture, so for a while their 64bit offerings were non-compatible (and Itanium was nothing like x86, whereas x86-64 extended the instruction set by adding 64bit instructions). Intel blinked first and adopted x86-64, although there are still a few differences: http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
Windows probably uses the common x86 or x86-64 instruction set for almost all code. I wouldn't be surprised if various drivers and codecs are shipped in multiple versions, and the correct one selected once the CPU has been interrogated.
(*) Actually, Intel make or have made various kinds of processors, including ARM (Intel's ARM processors were called XScale, but I think they've sold that business). And AMD make other processors too. But we know which Intel/AMD processors you mean :-)
AMD are Intel compatible, otherwise they would never have gained a foothold in the market place.
They are effectively clone compatible.
As you suspect, the main stream Intel and AMD processors have the same instruction set.
Windows does not run on ARM or PowerPC chips, for example, because it is somewhat dependant on the underlying instruction set.
However, most of Windows is written in C++ (as far as I know), which should be portable to other architectures. Windows NT even ran on PowerPC and other architectures.
Intel's 80x86 CPUs and AMD's 80x86 are "mostly the same sort of", but some things are completely different (e.g. virtual machine extensions - SVM vs. VT-x) and some things (extensions) may or may not be supported. However, some things are different on different CPUs from the same manufacturer too (e.g. some Intel chips support AVX2 and some don't).
There are multiple ways to deal with the differences:
only use the common subset so the same code runs on all 80x86 CPUs (e.g. treat it like an 8086 chip).
use a subset of features that is common to a range of CPUs so the same code runs on all 80x86 CPUs in that range. This is very common (e.g. "this software requires an 80x86 CPU (and OS) that supports 64-bit extensions").
use install-time tests. For example, there might be 4 different copies of software (compiled for 4 different ranges of CPUs) where the installer decides which copy makes sense for the computer the software is being installed on.
use run-time tests. For example, code can use the CPUID instruction to do if( AVX2_is_supported() ) { set_function_pointers_so_AVX2_is_used(); } else {set_function_pointers_so_AVX2_is_not_used(); }. Note: Some compilers (Intel's ICC) can automatically generate code that does run-time tests.
These aren't mutually exclusive options. For example, the installer might decide to install a 64-bit version (and not a 32-bit version), and then the 64-bit version might check which features are supported at run-time and have different code to use different features.
Also note that different parts of an OS can be treated separately. For example, an OS could have 6 different boot loaders, 4 different "HALs", 4 different kernels, and 3 different "kernel modules" to support virtualisation; where some of these things might do run-time tests and some might not.
Do Intel and AMD processor have the same assembler?
Almost all assemblers for 80x86 support almost all extensions (from all CPU manufacturers - e.g. Intel, AMD, VIA, Cyrix, SiS, ...). In general; it's up to the programmer (or compiler) to make sure they only use things that they know exist. Some assemblers provide features to make this easier (e.g. NASM provides a CPU ... directive so that the programmer can tell the assembler to generate errors if it sees instructions that aren't supported on the specified CPU).
AMD and Intel use the same instruction set.
When you install windows on an AMD processor or an Intel processor, it doesn't "compile" code on the machine.
I remember many people being confused on this subject back during college. They believe that a "setup" means that it is compiling code on your machine. It isn't. Most if not all Windows application outside of the free realms, are given to you by binary.
As for portability, that isn't neccessarily 100% true. While C is highly portable, in many cases writing for a specific OS or system will result in the code only being able to compile/executed on that box. For example, certain Unix machines handle files and directories differently so it might not be 100% portable.
Do Intel and AMD processor have the same assembler?
An assembler assembles a program to be run on a processor, so your question is flawed. Processors DO NOT use assemblers.
If you mean can Intel and AMD processor run the same assembler? Then the answer is YES!!!
All an assemblers are, is a program that assembles other programs from structured text files. Visual Basic is an example of an assembler.

Resources