C compiler's language [closed] - c

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I just want to know the language in which the C compiler was written. Please say something other than C.

Here's an excellent read: Reflections on Trusting Trust by Ken Thompson. Starts off with an overview of how the first C compilers were written. The boot-strapping technique to be precise. May not answer your question directly but gives you some insight.

Nearly all major C compilers are written in C. You might think there's a chicken-and-egg problem with this, but there's not. The process is called bootstrapping.

The very original C compiler was written (by K&R) in a predecessor language called B, or maybe BCPL. But once the C compiler was working well enough, they converted over to C and began using each successive version to compile the next.
Many of the bizarre features of C such as pre- and post-increment operators exist because (a) they represented special addressing modes on the PDP-11 on which the first C was developed, or (b) they helped the compiler fit in memory while compiling its own next version.
So that's the rest of the story.

GCC is written in C. The majority of C compilers are written in C.
There is a boot-strapping phase when first producing a compiler for a language (any language that has pretensions to be able to compile its own compiler - COBOL is one plausible exception, but there are many others) on a given platform, but once you have a compiler, then you write the compiler in that language.
All else apart, doing it in assembler is too expensive.

Depending on which C compiler, it was likely written in assembly, then it eventually probably became self-compiling so then parts were written in C.
You may browse the source for GCC for yourself at http://gcc.gnu.org/viewcvs/branches/

gcc is written in C
Clang is written in C++.
Those are the two I know.

You have to specify which compiler.

In the old days, people would write a small subset of the C language in assembler, and then use that to "bootstrap" compile a better C compiler written in C. These days it's more common to make a C compiler for a new architecture by cross compiling from an architecture that already works. I believe there are very few bits of, for instance, the gcc compiler, that aren't written in C or C++.

Seems to me it would be easiest to write a compiler in perl

Related

How can a C compiler be written in C? [duplicate]

This question already has answers here:
Writing a compiler in its own language
(14 answers)
Closed 9 years ago.
This question may stem from a misunderstanding of compilers on my part, but here goes...
One can find the following statement in the preface to the first edition of K&R (page xi):
The operating system, the C compiler, and essentially all UNIX applications programs (including all of the software used to prepare this book) are written in C.
(my emphasis)
Here's what I don't understand: doesn't that C compiler have to be compiled itself before it can compile any C code? And if that C compiler is written in C, wouldn't compiling it require an already existing C compiler?!
The only way out of this infinite-regression conundrum (or chicken-and-egg problem) is that the C compiler written in C that K&R are referring to was actually compiled with an already existing C compiler that was written in a language other than C. The C compiler written in C then superseded the latter.
Or am I completely off?
It's called Bootstrapping, quoting from Wikipedia:
If one needs a compiler for language X to obtain a compiler for language X (which is written in language X), how did the first compiler get written? Possible methods to solving this chicken or the egg problem include:
Implementing an interpreter or compiler for language X in language
Y. Niklaus Wirth reported that he wrote the first Pascal compiler in
Fortran.
Another interpreter or compiler for X has already been written in
another language Y; this is how Scheme is often bootstrapped.
Earlier versions of the compiler were written in a subset of X for
which there existed some other compiler; this is how some supersets
of Java, Haskell, and the initial Free Pascal compiler are
bootstrapped.
The compiler for X is cross compiled from another architecture where
there exists a compiler for X; this is how compilers for C are
usually ported to other platforms. Also this is the method used for
Free Pascal after the initial bootstrap.
Writing the compiler in X; then hand-compiling it from source (most
likely in a non-optimized way) and running that on the code to get
an optimized compiler. Donald Knuth used this for his WEB literate
programming system.
And if you are interested, here is Dennis Richie's first C compiler source.
Usually, a first compiler is written in another language (directly in PDP11 assembler in this case, or in C for most of the "modern" languages). Then, this first compiler is used to program a compiler written in the language itself.
You can read this page about the history of the C language. You will see that it is also strongly linked to the UNIX system.
See the Chicken and Egg section of the Wikipedia page:
If one needs a compiler for language X to obtain a compiler for language X (which is written in language X), how did the first compiler get written? Possible methods to solving this chicken or the egg problem include:
Implementing an interpreter or compiler for language X in language Y. Niklaus Wirth reported that he wrote the first Pascal compiler in Fortran.
Another interpreter or compiler for X has already been written in another language Y; this is how Scheme is often bootstrapped.
Earlier versions of the compiler were written in a subset of X for which there existed some other compiler; this is how some supersets of Java, Haskell, and the initial Free Pascal compiler are bootstrapped.
The compiler for X is cross compiled from another architecture where there exists a compiler for X; this is how compilers for C are usually ported to other platforms. Also this is the method used for Free Pascal after the initial bootstrap.
Writing the compiler in X; then hand-compiling it from source (most likely in a non-optimized way) and running that on the code to get an optimized compiler. Donald Knuth used this for his WEB literate programming system.
It's perfectly ordinary for a compiler to be written in the language it compiles. One way to achieve this would be to write a complete compiler for language L in some other language, and then to write a new compiler for L in L. A more interesting approach would be to write a minimal compiler for a subset of L in some other language, and then use this minimal subset to improve the compiler, making it less minimal increasing the available subset of L. In this way, a complete compiler can be built.

"C or gcc" is like "Chicken or the egg" ? :( [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How are gcc/g++ bootstrapped?
I would like to know how gcc is compiled as we all know it is written in C.
Did they used some other compiler to come up with gcc?
If so, can I use the same compile to compile my C program?
There is no chicken and egg here. glibc is compiled with the compiler you are using.
That compiler was first compiled with a previous version of the same compiler. Then it can compile itself as well.
The real chicken-and-egg problem was solved in the 1950's when someone had to write the world's first compiler. After that, you can use one compiler to compile the next one.
There are two basic ways to build a new compiler:
If you're writing a new compiler for an established language like C, use an existing compiler from a different vendor to build your new compiler. For example, you could use the C compiler shipped with HP-UX to build gcc.
If you're writing a compiler for a new language, start by implementing a very simple compiler in a different language (the first C compiler was written in PDP-11 assembler). This initial compiler will only recognize a small subset of the target language; basically enough to do some file I/O and some simple statements. Write a new compiler in the target language subset and build it with your first compiler. Now write a slightly more capable compiler that can recognize a larger subset of the target language, and build it with the second compiler. Repeat the process until you have a compiler capable of recognizing the full target language.
They did not use some other compiler. You can write a C program that doesn't use glibc by simply telling the compiler not to use it. So something like this:
gcc main.c -nostdlib
This is an interesting question. I think you are wondering in what language is written a compiler of a new language, aren't you? Well, if we had only Assembly language (for instance,x86), the only way to write a C compiler would be in Assembly language. Later, we could write a better, yet more powerful compiler written in C by using our assembly-written compiler, and so on...
An the question arises: how did the early programmers write the first assembly compiler? My father told me: by manually entering the 1's and 0's! :-)

How does C work? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How was the first compiler written?
I'm asking this as a single question because, essentially what I'm trying to ask is at the bottom how is all of this implemented, here goes:
How was the first C compiler generated, since C compiler is written in C itself then how was the first source of C compiler generated?
Is C written in ASM, how are languages actually designed?, because before we had high level languages the only way to design something was through ASM, even if C is derived from earlier languages, how were they designed? (My clue is ASM)
I'm getting confused as to how does C work down at the bottom. What I'm trying to say is since at the bottom, everything is implemented at the processor by OPcodes. So what my understanding was that C programs are "essentially" translated to Sys Calls which are implemented by the Kernel.
But then how are syscalls implemented? (Do they directly correspond to OPcodes or is there any other layer of abstraction.
How was the first C compiler generated, since C compiler is written in C itself then how was the first source of C compiler generated?
Bootstrapping.

Shouldn't This File Start With Assembly Language [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
http://lxr.linux.no/#linux+v3.0.3/arch/x86/boot/header.S
This is the first file that is first read by the CPU. So shouldn't this start in Assembly Language. It starts with #include so include is a method in C?
#include is a directive to the preprocessor, not the assembler. The preprocessor has nothing to do with the compiler.
That's the source code to the file. It gets compiled into machine language before it's used as part of the OS.
Given that it's AT&T syntax, the first thing you should do is check out the manual for GAS, which is part of the GNU binutils collection:
http://sourceware.org/binutils/docs-2.21/as/Preprocessing.html#Preprocessing
According to the manual:
"You can use the gnu C compiler driver to get other “CPP” style preprocessing by giving the input file a `.S' suffix."
That means the .S assembly files are meant to be assembled by running them through the GCC frontend, which applies the C preprocessor for macros and #include commands, and then passes the result to the GNU binutils assembler.
This is a .S file, therefore it can be processed by the C-preprocessor, of which #include is a valid C-preprocessor directive. If it was only a .s file, then that would typically be considered a "pure" gas syntax assembly file, at least from the standpoint of gcc.
#include is a preprocessor statement. The compiler won't see it at all.
You are looking at a source file. It will be compiled to produce assembler code and then it will be linked by linker (or compiler in some cases). What linker will do is it will look at linking table and the sections in header.S file and arrange them in correct manner.
That's the pre-processor it will be replaced by some other code after the pre-processing state, which will include the contents of the file at the place where it was defined. After that the compiler will compile the code, and the output from it will be assembled by the assembler, which will them read by the CPU and decoded.
Whatever code you write in whatever language, it is converted into machine code before it can execute. All the C programs are and all others are converted into machine code first and then that code will be read by the CPU, and not the C language syntax.

Low level capabilities of high level languages [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I would like to know some low-level capabilities of high-level languages.
Off the top of my head I could point out:
-bitwise operations
-bit fields
-pointer arithmetic
-inline assembly
-interrupt functions
I would apreciate if you pointed out some, that aren't in my list. It would be nice if C or Pascal had them, but basically any high-level language will do.
Thank you.
C does not support inline assembler nor interrupts, all C code implementing them is using non-standard compiler extensions. C++ however, has support for inline assembler through the standard.
Here are some other important, hardware-related features of C:
Function pointers are rather unique for C/C++ and makes it possible to execute code located at a specific memory address, and makes it possible to perform other hardware-related tasks. See this for more details of function pointer uses: Function pointers in embedded systems.
The integer types. Both C and Pascal support int types of different sizes (byte, word, double word etc), although their sizes are not specified by the standards. For the same reason, the sizeof operator may be important as well.
C also has some support for memory alignment, for example explicitly stating rules for how padding bytes should behave.
The volatile keyword is also an important feature for hardware-related programming, as it allows variables to be updated in realtime, and without worries about compiler optimizations.
The const keyword is used in hardware-related programming to determine where the data will end up: NVM or RAM.
Other important features that C lacks are multi-threading support as part of the language, and memory barrier support. Some C compilers implement memory barriers through the volatile keyword, but there are no guarantees for it to work by any standard.
Quoting Wikipedia:
A high-level programming language is a programming language with strong abstraction from the details of the computer.
C is no such language, as it stays extremely close the the details of the computer.
And looking at your list:
bitwise operations
bit fields
pointer arithmetic
inline assembly
interrupt functions
All of those are closely related to the computer/OS architecture itself and are considered not high-level.
One high-level language with very good support for low-level programming is Ada.
In addition to previously mentioned C, Ada has also intrinsic support for concurrent systems. Tasks are a language construct, and do not need separate libraries. For concurrent systems, Ada also provides so called protected types, which allows usage of shared variables or data between tasks without additional consideration of mutual exclusion or signalling. The basic language libraries also provide support for interrupt handling.
For data access, the exact representation of data can be defined by the use of representation clauses. As a result of strong typing, it is also trivial to define view conversions between different representations of data, allowing for example tradeoffs between space and speed.
It is also possible to directly generate assembly as needed, by machine code insertions.

Resources