Operating Systems: Compiler Confusion

Operating Systems: Compiler Confusion - c

I was posed the question by a classmate asking since an OS is an extended or virtual machine, does the compiler need to know the number of registers, or instructions of the processor when it generates assembly code of a C program.
I've spent a while scouring the internet and here is what I think...
It doesn't need to know the number of registers because being a virtual machine it has unlimited resources in memory per say.
However, it does need to know the instructions of the processor to know when it is able to perform specific functions at specific times.
I was wondering if someone could clarify this for me because I'm not very confident in my answers.

In practice, the compiler is compiling (into object code, often via some assembler file) not only for a target processor (in particular instruction set architecture - ISA), but for a target application binary interface - ABI, which defines some conventions regarding register usage (and how to make system calls) & calling conventions.
An Operating system (provided by the kernel) is - or gives to application programs and processes - a virtual machine very close to the processor; the VM is the (user-mode, unpriviledged) machine instructions + an instruction (SYSENTER) to switch into kernel or supervisor mode for system calls.
See also this & that. Regarding compilers, read about register allocation, instruction scheduling, optimizing compilers.
If you have GCC on your computer, try compiling a hello-world program (perhaps in a fresh directory) with gcc -fverbose-asm -O -S hello.c then look into the generated assembler code hello.s; add -fdump-tree-gimple and look into additional compiler dump file[s] (and even more of them with -fdump-tree-all)
PS. Some compilers compile to machine code in memory (e.g. SBCL). Read also about JIT compilers. Other compilers compile to C code.

Compilation have several stages, from different abstractions to the target machine and this depend on the compiler architecture.
In some stages, registers are not very limited, but at some stages later a mapping is done. You can read about register allocation for more details. I can also suggest you to have a look at Appel's book about compilers architecture.

Related

Compilers and Instruction sets

"C is a genereal purpose language, not tied to a particular system"
The C programming Language, BRIAN W KERNIGHAN & DENNIS M. RITCHIE
Yet with the right compiler we can make a .exe which runs on every Windows machine, which in turn means on every CPU Windows runs on.
So my question is: does every x86-64 CPU (Intel or AMD) use the same instruction set ? (yes, I could make a comparison...) if not, then I'll have to assume that the compiler detects what CPU we're running and uses the right instruction set during compile time.
Am I totally mistaken ?
I barely know what I'm talking about so please bear with me.
Just a dude trying to look under the hood.
Thank you

Intel makes many different processor models that share a core instruction set of the “x86-64” family (and additional processor models that do not). Even among the processors with the shared core instructions, there are variations. Newer models may have instructions that older models did not, and some parts of the instruction set may be on certain models and not others.
Some instructions even behave differently on different processors.
When you compile a program, the compiler “targets” a particular combination of instruction subsets. This means the instructions in those subsets are available for the compiler to use when it is generating code. The compiler might or might not use any particular instruction or subset depending on its needs or choices when compiling a particular program. The resulting program is then suitable for processor models with the targeted instructions and not for other models (unless the compiler happened not to use any of the instructions not on those models, even though it could have).
Often, the default setting for the compiler‘s target is either a processor model like the one you are running on or some typical selection of instruction subsets that is common for modern processor models. The target may also be selected based on other settings you give the compiler, such as asking it to target a particular version of an operating system. However, you can pass the compiler switches to tell it to compile for entirely different targets, even for entirely different architectures, such as compiling for an ARM processor while running on an Intel processor.
Software is also part of a computer system, so the executable file the compiler produces may also depend on certain software libraries being available at run-time or certain operating system features being available.

is Program compiled by amd64 compiler executable and possible to run,work properly in x86 cpu?

is Program compiled by amd64 compiler executable and possible to run,work properly in x86 cpu??
I wanna know whether it's possible
and also im trying to develop some program in Qt
but I'm wondering at that why there is no qmake.exe that supports MSVC2017 32bit compiler

No. But a program written without reference to specific architecture dependent features (i.e anything written using standard c, c++, etc) can be compiled using different flags for different target architectures.
https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html
If you are interested in why, looking at the spec for x86 or x86-64 will give you a sense of the answer. An architecture specification is alot more than a list of supported machine instruction. They have different memory architecure, different flags, different cpu modes, etc. And in addition to all this, specifications have hardware specific implementations (chips support different features). When you compile a executable binary, all of these differences must be taken into account.

How can linux boot code be written in C?

I'm a newbie to learning OS development. From the book I read, it said that boot loader will copy first MBR into 0x7c00, and starts from there in real mode.
And, example starts with 16 bit assembly code.
But, when I looked at today's linux kernel, arch/x86/boot has 'header.S' and 'boot.h', but actual code is implemented in main.c.
This seems to be useful by "not writing assembly."
But, how is this done specifically in Linux?
I can roughly imagine that there might be special gcc options and link strategy, but I can't see the detail.

I'm reading this question more as an X-Y problem. It seems to me the question is more about whether you can write a bootloader (boot code) in C for your own OS development. The simple answer is YES, but not recommended. Modern Linux kernels are probably not the best source of information for creating bootloaders written in C unless you have an understanding of what their code is doing.
If using GCC there are restrictions on what you can do with the generated code. In newer versions of GCC there is an -m16 option that is documented this way:
The -m16 option is the same as -m32, except for that it outputs the ".code16gcc" assembly directive at the beginning of the assembly output so that the binary can run in 16-bit mode.
This is a bit deceptive. Although the code can run in 16-bit real mode, the code generated by the back end uses 386 address and operand prefixes to make normally 32-bit code execute in 16-bit real mode. This means the code generated by GCC can't be used on processors earlier than the 386 (like the 8086/80186/80286 etc). This can be a problem if you want a bootloader that can run on the widest array of hardware. If you don't care about pre-386 systems then GCC will work.
Bootloader code that uses GCC has another downside. The address and operand prefixes that get get added to many instructions add up and can make a bootloader bloated. The first stage of a bootloader is usually very constrained in space so this could potentially become a problem.
You will need inline assembly or assembly language objects with functions to interact with the hardware. You don't have access to the Linux C library (printf etc) in bootloader code. For example if you want to write to the video display you have to code that functionality yourself either writing directly to video memory or through BIOS interrupts.
To tie it altogether and place things in the binary file usable as an MBR you will likely need a specially crafted linker script. In most projects these linker scripts have an .ld extension. This drives the process of taking all the object files putting them together in a fashion that is compatible with the legacy BIOS boot process (code that runs in real mode at 0x07c00).
There are so many pitfalls in doing this that I recommend against it. If you are intending to write a 32-bit or 64-bit kernel then I'd suggest not writing your own bootloader and use an existing one like GRUB. In the versions of Linux from the 1990s it had its own bootloader that could be executed from floppy. Modern Linux relies on third party bootloaders to do most of that work now. In particular it supports bootloaders that conform to the Multiboot specification
There are many tutorials on the internet that use GRUB as a bootloader. OS Dev Wiki is an invaluable resource. They have a Bare Bones tutorial that uses the original Multiboot specification (supported by GRUB) to boot strap a basic kernel. The Mulitboot specification can easily be developed for using a minimal of assembly language code. Multiboot compatible bootloaders will automatically place the CPU in protected mode, enable the A20 line, can be used to get a memory map, and can be told to place you in a specific video mode at boot time.
Last year someone on the #Osdev chat asked about writing a 2 stage bootloader located in the first 2 sectors of a floppy disk (or disk image) developed entirely in GCC and inline assembly. I don't recommend this as it is rather complex and inline assembly is very hard to get right. It is very easy to write bad inline assembly that seems to work but isn't correct.
I have made available some sample code that uses a linker script, C with inline assembly to work with the BIOS interrupts to read from the disk and write to the video display. If anything this code should be an example why it's non-trivial to do what you are asking.

Compiling C and assembling ASM into machine code [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have three questions:
What compiler can I use and how can I use it to compile C source code into machine code?
What assembler can I use and how can I use it to assemble ASM to machine code?
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
My goal:
I'm trying to make a basic operating system. This would use a personally made bootloader and kernel. I would also try to take bits and pieces from the Linux kernel (namely the drivers) and integrate them into my kernel. I hope to create a 32-bit DOS-like operating system for messing with memory on most modern computers. I don't think I will be creating a executable format for my operating system, as my operating system wont be dynamic enough to require it.
My situation:
I'm running on a x86-64 windows 8 laptop with a Intel Celeron CPU; I believe it uses secure boot. I would be testing my operating system on a x86-64 desktop with Intel Core I3 CPU. I have a average understanding of operating systems and their techniques. I know the C, ASM, and computer theory required for this project. I think it is also note worthy that I'm sixteen with no formal education about computer science.
My research: After searching Google for what C normally compiles into, I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code. Assembly as I understand normally assembles into a PE formatted executable. I have heard of the Cygwin, GCC C, and MingW C compilers. As for assemblers, I have heard of FASM, MASM, and NASM. I have searched websites such as OSDev and OSDever.
What I have tried: I tried to setup GCC (a nightmare) and create a cross compiler (another nightmare).
Conclusion: As you can tell, I'm vary confused about compilers, assemblers, and executable formats. Please dispel my ignorance along with answering my questions. These are probably the only things keeping me from having a OS on my resume. Sorry, I would have included more links, but stackoverflow wouldn't let me make more then two. Thanks a ton!

First, some quick answers to your three questions.
Pretty much any compiler will translate C code into assembly code. That's what compilers do. GCC and clang are popular and free.
clang -S -o example.s example.c
Whichever compiler you choose will probably support assembly as well, simply by using the same compiler driver.
clang -o example.o example.s
Your linker documentation will tell you how to put specific code at specific addresses and so forth. If you use GCC or clang as described above, you will probably use ld(1). In that case, read into 'linker scripts'.
Next, some notes:
You don't need a cross compiler or to set up GCC by yourself. You're working on an Intel machine, generating code for an Intel machine. Any binary distribution of clang or GCC that comes with your linux distribution should work fine.
C compilers normally compile code into assembly, and then pass the resulting assembly off to a system assembler to end up with machine code. Machine code, binary, plain binary, raw binary, are all basically synonymous.
The generated machine code is packaged into some kind of executable file format, to tell the host operating system how to load and run the code. On windows, it's PE, on Linux, it's ELF, and on Mac OS X it's Mach-O.
You don't need to create an executable format for your OS, but you will probably want to use one. ELF is a pretty straightforward (and well-documented) option.
And a bit of a personal note that I hope doesn't discourage you too much - If you are not very familiar with how compilers, assemblers, linkers, and all of those tools work, your project is going to be very difficult and confusing. You might want to start with some smaller projects to get your "sea legs", so to speak.

At first "machine code" and "binary" are synonyms. "Object code" is some kind of intermediate form, that the linker will convert to binary at the end. Some C/C++ compilers generate not directly binary, but assembler source code, that they feed to the assembler, that produces object code and then to the linker, that makes the final binary. In the most cases these processes are transparent to the user. You feed the compiler with C/C++/Pascal/whatever source code and get a binary file at the output.
FASM assembler, aka flatassembler is the best assembler for OS development. There are several OSes already created in FASM.
That is because FASM is self compilable and is very easy portable. This way, for 2..3 days, you can port it to your OS and then your OS will become self sufficient - i.e. you will be able to compile the programs from within your OS.
Another good feature of FASM is that it does not need linker - it can generate directly binary files in several formats.
The big active community is also very important. There are tons of sources available for FASM, including for OS development.
The message board is very active and is place where one can learn a lot.

I think the first part of your question has been answered, so I'll take on the other two:
What assembler can I use and how can I use it to assemble ASM to machine code?
One of nasm, yasm (basically very like nasm), fasm, "masm" i.e. ml64.exe, ml.exe and freely available as part of the Microsoft tools.
Of these, I probably recommend either nasm or yasm. That recommendation is based entirely on personal preference - but the wide range of platforms they support, plus using Intel syntax by default are my reasons. I'd try a few and see what you like.
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
Well, there is only one way to place the bootloader at the correct address for MBR - open the disk at LBA 0 and write exactly 512 bytes there, ending in 0x55AA. Flush, then close. The MBR usually also contains a partition table embedded in it - it is both code and data. The sciency term for this stuff is Von Neumann Architecture which can be briefly summarised as "programs and data are stored in the same place". The action of the BIOS on wanting to boot from disk will be to read the first 512 bytes into memory, check the signature and if it matches, execute that memory (starting from byte 0).
OK, that's those questions out of the way. Now I'll give you some more notes:
512-bytes for a bootloader is not really enough for anyone's usage. As such, some file systems contain boot sectors and the bootloader itself simply loads the code/data found in these. This allows for larger amounts of code to be loaded - enough to get a kernel going. For example, grub contains stage1, stage1_5 and stage2 components in the legacy version.
Although most operating systems require you to use an executable format container, you don't need one. On disk and in memory, executable code is just one, two or three byte strings called opcodes. You can read the opcode reference or the Intel/AMD manuals to find out what hexadecimal value translates to what. Anyway, you can perform a direct conversion from assembler to binary using nasm like this:
nasm -f bin input.asm -o output.asm
Which will work for 16, 32 or 64 bit assembler quite happily although the result likely won't execute. The only place it will is if you explicitly use the [bits 16] directive in your code, along with org 100h, then you have an MSDOS .com program. Unfortunately, this is the simplest of binary formats in existence - you only have code and data in one big lump and this must not exceed the size of a single segment.
I feel this might handle this point:
I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code.
The answer as to what assembly assembles to - it assembles to opcodes and memory addresses, depending on the assembler. This is represented in bytes which are data all of themselves. You can read them raw with a hex editor although there are few occasions where this is strictly necesary. I mention memory addresses because some opcodes control how memory addresses are interpreted - relocatable object code for example requires that addresses are not hard-coded (instead, they are interpreted as offsets from the current location).
Assembly as I understand normally assembles into a PE formatted executable.
It is fair to say the assembler from which your C/C++ was derived is compiled to opcodes which are then, along with anything else to be included in the program (data, resources) are stored in an executable format, such as PE. Normally depends on your OS.
If you have thoroughly read the OSDev Wiki, you'll realise segmented addressing is an utter pain - the standard and only usage of segments in modern operating systems is to define four segments spanning the entire address space - two data segments at ring 0 and 3, two code segments at ring 0 and 3.
If you haven't read the OSDEV Wiki thoroughly, you should. I'd also recommend JamesM's kernel tutorials which contain practical advice on building a kernel in C.
If you simply want to do bad things to a DOS kernel, you actually still can without needing to write a full kernel yourself. You should also be able to switch the CPU to protected mode from DOS, too. You need FreeDOS and an assembler of your choice. There is an excellent tutorial on terminate and stay resident which basically means hooking an interrupt routine, then editing yourself out of the active process list, in The Rootkit Arsenal. There are probably tutorials on the internet for this, too.
I might be tempted to recommend doing this as a first, just to get yourself used to this kind of low level stuff.
If you just wanted to poke an OS, you can set up kernel debugging on Windows. WinDbg is a bit... arcane, but once you get used to it it makes sense.
You mention your laptop uses secure boot. If this is the case your laptop uses UEFI. If you want to read up on this, the UEFI spec is 100% guaranteed more boring than your maths homework, but I recommend skimming it just to understand the goals and the basic environment. THe important thing is to have the EFI SDK which enables you to build EFI-compatible applications (which are in PE format and exist on a FAT32 partition on your disk - so installing an EFI bootloader is very simple even if writing one is not so. If I had to make an honest recommendation, I'd stick to MBR for now, since emulating OSes with MBR is much easier than EFI at the time of writing and you really do want to do this in some form of VM for now. Also, I'd use an existing one like grub, since bootloaders are not all that exciting, really.
Others have said it, and I will say it: You absolutely want to do anything like this under some form of emulator or virtual machine. You will make a mistake, guaranteed, and you will come up against things you don't understand. Emulators and VM software are free these days, and some such as BOCHS will tell you what the reason for a given fault, trap etc is. This is massively helpful!

First, use something like Virtual box for your testing
I think you might want to take some smaller steps, get comfortable writing C code.
then look into how boot sectors on disks work ( well documented on the internet) also look at code of other open source boot loaders.
Then look at how to do task switching. Its not too hard to write. You can even write most of it while running it under your normal OS before trying to embeded into your own OS
With C compilers you can generally mix in asm inline usually with asm { /* assembly code */ }

Do Intel and AMD processor have the same assembler?

The C language was used to write UNIX to achieve portability -- the same C language program compiled using different compilers produces different machine instructions. How come Windows OS is able to run on both Intel and AMD processors?

AMD and Intel processors(*) have a large set of instructions in common, so it is possible for a compiler or assembler to write binary code which runs "the same" on both.
However, different processor families even from one manufacturer have their own sets of instructions, usually referred to as "extensions" or whatever. Ignoring the x87 co-processor, the first time I remember this being a marketing point was when everything suddenly went "with MMX(TM) technology". Binary code expected to run on any processor either needs to avoid extensions, or to detect the CPU type before using them.
Intel's Itanium 64-bit architecture was completely different from AMD's x86-64 architecture, so for a while their 64bit offerings were non-compatible (and Itanium was nothing like x86, whereas x86-64 extended the instruction set by adding 64bit instructions). Intel blinked first and adopted x86-64, although there are still a few differences: http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
Windows probably uses the common x86 or x86-64 instruction set for almost all code. I wouldn't be surprised if various drivers and codecs are shipped in multiple versions, and the correct one selected once the CPU has been interrogated.
(*) Actually, Intel make or have made various kinds of processors, including ARM (Intel's ARM processors were called XScale, but I think they've sold that business). And AMD make other processors too. But we know which Intel/AMD processors you mean :-)

AMD are Intel compatible, otherwise they would never have gained a foothold in the market place.
They are effectively clone compatible.

As you suspect, the main stream Intel and AMD processors have the same instruction set.
Windows does not run on ARM or PowerPC chips, for example, because it is somewhat dependant on the underlying instruction set.
However, most of Windows is written in C++ (as far as I know), which should be portable to other architectures. Windows NT even ran on PowerPC and other architectures.

Intel's 80x86 CPUs and AMD's 80x86 are "mostly the same sort of", but some things are completely different (e.g. virtual machine extensions - SVM vs. VT-x) and some things (extensions) may or may not be supported. However, some things are different on different CPUs from the same manufacturer too (e.g. some Intel chips support AVX2 and some don't).
There are multiple ways to deal with the differences:
only use the common subset so the same code runs on all 80x86 CPUs (e.g. treat it like an 8086 chip).
use a subset of features that is common to a range of CPUs so the same code runs on all 80x86 CPUs in that range. This is very common (e.g. "this software requires an 80x86 CPU (and OS) that supports 64-bit extensions").
use install-time tests. For example, there might be 4 different copies of software (compiled for 4 different ranges of CPUs) where the installer decides which copy makes sense for the computer the software is being installed on.
use run-time tests. For example, code can use the CPUID instruction to do if( AVX2_is_supported() ) { set_function_pointers_so_AVX2_is_used(); } else {set_function_pointers_so_AVX2_is_not_used(); }. Note: Some compilers (Intel's ICC) can automatically generate code that does run-time tests.
These aren't mutually exclusive options. For example, the installer might decide to install a 64-bit version (and not a 32-bit version), and then the 64-bit version might check which features are supported at run-time and have different code to use different features.
Also note that different parts of an OS can be treated separately. For example, an OS could have 6 different boot loaders, 4 different "HALs", 4 different kernels, and 3 different "kernel modules" to support virtualisation; where some of these things might do run-time tests and some might not.
Do Intel and AMD processor have the same assembler?
Almost all assemblers for 80x86 support almost all extensions (from all CPU manufacturers - e.g. Intel, AMD, VIA, Cyrix, SiS, ...). In general; it's up to the programmer (or compiler) to make sure they only use things that they know exist. Some assemblers provide features to make this easier (e.g. NASM provides a CPU ... directive so that the programmer can tell the assembler to generate errors if it sees instructions that aren't supported on the specified CPU).

AMD and Intel use the same instruction set.
When you install windows on an AMD processor or an Intel processor, it doesn't "compile" code on the machine.
I remember many people being confused on this subject back during college. They believe that a "setup" means that it is compiling code on your machine. It isn't. Most if not all Windows application outside of the free realms, are given to you by binary.
As for portability, that isn't neccessarily 100% true. While C is highly portable, in many cases writing for a specific OS or system will result in the code only being able to compile/executed on that box. For example, certain Unix machines handle files and directories differently so it might not be 100% portable.

Do Intel and AMD processor have the same assembler?
An assembler assembles a program to be run on a processor, so your question is flawed. Processors DO NOT use assemblers.
If you mean can Intel and AMD processor run the same assembler? Then the answer is YES!!!
All an assemblers are, is a program that assembles other programs from structured text files. Visual Basic is an example of an assembler.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight