Low level capabilities of high level languages [closed] - c

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I would like to know some low-level capabilities of high-level languages.
Off the top of my head I could point out:
-bitwise operations
-bit fields
-pointer arithmetic
-inline assembly
-interrupt functions
I would apreciate if you pointed out some, that aren't in my list. It would be nice if C or Pascal had them, but basically any high-level language will do.
Thank you.

C does not support inline assembler nor interrupts, all C code implementing them is using non-standard compiler extensions. C++ however, has support for inline assembler through the standard.
Here are some other important, hardware-related features of C:
Function pointers are rather unique for C/C++ and makes it possible to execute code located at a specific memory address, and makes it possible to perform other hardware-related tasks. See this for more details of function pointer uses: Function pointers in embedded systems.
The integer types. Both C and Pascal support int types of different sizes (byte, word, double word etc), although their sizes are not specified by the standards. For the same reason, the sizeof operator may be important as well.
C also has some support for memory alignment, for example explicitly stating rules for how padding bytes should behave.
The volatile keyword is also an important feature for hardware-related programming, as it allows variables to be updated in realtime, and without worries about compiler optimizations.
The const keyword is used in hardware-related programming to determine where the data will end up: NVM or RAM.
Other important features that C lacks are multi-threading support as part of the language, and memory barrier support. Some C compilers implement memory barriers through the volatile keyword, but there are no guarantees for it to work by any standard.

Quoting Wikipedia:
A high-level programming language is a programming language with strong abstraction from the details of the computer.
C is no such language, as it stays extremely close the the details of the computer.
And looking at your list:
bitwise operations
bit fields
pointer arithmetic
inline assembly
interrupt functions
All of those are closely related to the computer/OS architecture itself and are considered not high-level.

One high-level language with very good support for low-level programming is Ada.
In addition to previously mentioned C, Ada has also intrinsic support for concurrent systems. Tasks are a language construct, and do not need separate libraries. For concurrent systems, Ada also provides so called protected types, which allows usage of shared variables or data between tasks without additional consideration of mutual exclusion or signalling. The basic language libraries also provide support for interrupt handling.
For data access, the exact representation of data can be defined by the use of representation clauses. As a result of strong typing, it is also trivial to define view conversions between different representations of data, allowing for example tradeoffs between space and speed.
It is also possible to directly generate assembly as needed, by machine code insertions.

Related

_asm in which cases is it best to use it? [duplicate]

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?
Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.
You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

C WikiBooks - How is C a small "what you see is all you get" language?

I'm unable to understand one of the following sentence from WikiBooks :
Why C, and not assembly language?
" C is a compiled language, which creates fast and efficient executable files. It is also a small "what you see is all you get" language: a C statement corresponds to at most a handful of assembly statements, everything else is provided by library functions. "
Website Link : C Programming/Why learn C? - Wikibooks, open books for an open world
Note : I am a complete beginner and I've started to learn C . So, I need a precise explanation of what the above sentence means.
The assembly is the language for a single processor family, it is directly compiled to the machine code that the processor runs. If one programs in assembly, one needs to rewrite the entire code for the different processor family. Phones usually use ARM processors whereas the desktop computers have 32-bit or 64-bit x86-compatible processors. Each 3 of these potentially need a completely separately written program, and perhaps not even limited to that.
In contrast standard C is a portable language - if you write so-called strictly conforming programs. C11 4p5:
A strictly conforming program shall use only those features of the language and library specified in this International Standard. (3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
With footnote 5 noting that:
Strictly conforming programs are intended to be maximally portable among conforming implementations. Conforming programs may depend upon nonportable features of a conforming implementation
Unlike the assembler whose specifics vary from processor to another, it is possible to write programs in C and then port them to various platforms without any changes into the source code, yet these programs will still be compiled into the assembly language, and performance could - and often will - surpass hand-written assembly when using a modern high-quality optimizing compiler.
Additionally the C standard library, which any conforming hosted implementation needs to provide, provides for a portable way to manage files, dynamic memory, input and output, all of which are not only processor but also operating-system specific when using assembler.
However, C is still quite close to the assembly, to the extent that it has been called a "high-level assembly language" by some.
It makes no sense to say compiled language and interpreted language.
This kind of statement are made by persons without education and could not understand the foundations of programming.
A language is defined mathematically via a way to define languages -- operational, denotational, axiomatic, etc and the programmers implement the language as they wish.
There are machines that run C via interpretation, they dispatch the code at the moment of execution and execute it instead of accumulating some object code that would be executed later by some machine, etc.
It is correct to say compiled implementation, interpreted implementation for the language, but even so it is relative to a given machine. Because when you compile it for x86 processors, the compiled code is interpreted by the datapath and controller of a stack machine for the X86 language, etc.
Basically the statement what you see is all you get means that is almost 1 to 1 correspondence between the operators of the CAM defined in the abstract semantics of ISO 9899 and the current stack-machines on the market, like x86, mips, etc.
C is nothing more than a platform-independent Assembly translator, what you write in C is efficiently "translated" into machine code as it would if you write it directly in Assembly. Thats the point of:
"what you see is all you get" language: a C statement corresponds to at most a handful of assembly statements
Any C sentence you write is directly transformed to ASM by the compiler without abstraction layers, interpreters, etc, unlike other languages.
By definition, C is tinny, it has nothing but the esentials to be considered a turing complete language and nothing more. Any additional feature is achieved via libraries, C ships the std lib (diferent implementations tho) that packs things like RNG, memory management, etc.
That's what this means:
everything else is provided by library functions
It's an old and largely outdated claim about C.
C was originally designed as, roughly, a more readable and portable assembler. For this reason, most of the core language features tended - on most target machines - to be easily translated. Generally more complicated functionality was provided by library functions, including the standard library.
Over time, C (both the language and the standard library) have evolved, and become more complicated. Computing hardware has also become more complicated - for example, supporting a set of more advanced instructions - and C constructs which can be implemented in terms of advanced instructions will translate to more complicated assembler on machines that support older and simpler instruction sets.
The distinction between a "small" language and a "large" one is completely subjective - so some people still continue to describe C as small and simple, both others describe is as large and complex. While simpler than some other languages (like C++), C is now also significantly more complex - by various measures - than quite a few other programming languages.
This quote is absolutely true for the good old K&R C implementation of the 70'. In this old days, C was indeed a thin wrapper around machine instructions, and the programmer could easily guess how the compiler would translate the source:
for loop: a counter in appropriate register, a test at end of loop a goto
function call: push arguments to the stack (with no conversion!), call the sub-routine address. On return put the return value (required to be scalar or pointer) to the appropriate register and use machine return. On return, the caller cleans up the stack
On a symetric point of view, anything that could be executed by the processor could be expressed in C. If you have an array of two integers and know that the internal representation is a valid double, just cast a pointer and use it.
That's all wrong with recent version of the C language and with optimizing compilers. The as if rule allows the optimizer to do anything, provided the observable results are what a natural implementation should have given. Many operations can invoke Undefined Behaviour. For example writing a float at a memory location and using it as an integer is explicitely UB. The optimizer can assume that no UB exists in the program, so it can just optimize out any block containing UB (recent versions of gcc are great at that).
Look for example at this function:
void stopit() {
int i = 0;
while(1) {
i+=1;
}
printf("done");
}
It contains an infinite loop, so the printf should never be reached. But the loop has no observable result, so the compiler is free to optimize it out and translate it the same as:
void stopit() {
printf("done");
}
Another example
int i = 12;
float *f = &i;
*f = 12.5; // UB use an float variable to access an int
printf("0x%04x\n", i); // try to dump the representation of 12.5
This code can legally display 0x000c, because the compiler is free to assume that the *f=0. has not modified i, so it can directly use a cached value and translate the last line directly as printf("0x%04x\n", 12);
So not, recent versions of the C language are no longer a small "what you see is all you get" language
What is true is that C is a low level language. The programmer has full control on allocation/deallocation of dynamic storage. You have a natural access at the byte level for any type, you have the notion of pointer and explicit pointer/integer conversion to allow direct access to well known memory addresses. That indeed allows to program embedded systems or micro-controller in C. The standard even defines two environment levels: a hosted environment where you have full access to the standard library and a freestanding environment where the standard library is not present. This can be specifically interesting for systems with very little memory.
C provides low-level control of memory and resources at the byte and bit level. For example C and assembly language are very common in the programming of microcontrollers (my area of expertise), which have very little memory and most often require bit-level control of input and output ports.
If you write a C program and build it, then look at your listing file, you'll typically see the very close correspondence between your C statements and the few assembly instructions into which the C is assembled.
Another clue to its simplicity is to look at its grammar definition as compared to that for C# or Java or Python, for example. The C grammar is small, terse, compact compared to the "fuller" languages, and it's true, there isn't even input or output defined in C. That typically comes from including stdio.h or similar. In this way, you only get what you need in your executable. That is in start contrast to the "big" languages.
While many in the embedded (microcontroller) programming space still prefer assembly, C is a great way to abstract a little bit things like flow of control and pointers, while still retaining the power to employ practically every instruction the microprocessor or microcontroller is capable of.
Regarding the "what you see is all you get" statement...
C is a "small" language in that provides only a handful of abstractions - that is, high-level language constructs that either hide implementation-specific details (such as I/O, type representations, address representations, etc.) or simplify complex operations (memory management, event processing, etc.). C doesn't provide any support at the language level (either in the grammar or standard library) for things like networking, graphics, sound, etc.; you must use separate, third-party libraries for those tasks, which will vary based on platform (Windows, MacOS, iOS, Linux). Compare that to a language like Java, which provides a class library for just about everything you could ever want to do.
Compared to languages like C++ and Java, not a whole lot of things happen "under the hood" in C. There's no overloading of functions or operators, there are no constructors or destructors that are automatically called when objects are created or destroyed, there's no real support for "generic" programming (writing a function template that can be automatically instantiated for arguments of different types), etc. Because of this, it's often easier to predict how a particular piece of code will perform.
There's no automatic resource management in C - arrays don't grow or shrink as you add or remove elements, there's no automatic garbage collection that reclaims dynamic memory that you aren't using anymore, etc.
The only container provided by the C language is the array - for anything more complex (lists, trees, queues, stacks, etc.) you have to write your own implementation, or use somebody else's library.
C is "close to the machine" in that the types and abstractions it provides are based on what real-world hardware provides. For example, integer and floating-point representations and operations are based on what the native hardware supports. The size of an int is (usually) based on the native CPU's word size, meaning it can only represent a certain range of values (the minimum range required by the language standard is [-32767..32767] for signed integers and [0..65535] for unsigned integers). Operations on int objects are mapped to native ADD/DIV/MUL/SUB opcodes. Languages like Python provide "arbitrary precision" types, which are not limited by what the hardware can natively support - the tradeoff is that operations using these types are often slower, since you're not using native opcodes.

What are some "real-life" uses of inline assembly? [duplicate]

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?
Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.
You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

Has the use of C to implement other languages constrained their designs in any way?

It seems that most new programming languages that have appeared in the last 20 years have been written in C. This makes complete sense as C can be seen as a sort of portable assembly language. But what I'm curious about is whether this has constrained the design of the languages in any way. What prompted my question was thinking about how the C stack is used directly in Python for calling functions. Obviously the programming language designer can do whatever they want in whatever language they want, but it seems to me that the language you choose to write your new language in puts you in a certain mindset and gives you certain shortcuts that are difficult to ignore. Are there other characteristics of these languages that come from being written in that language (good or bad)?
I tend to disagree.
I don't think it's so much that a language's compiler or interpreter is implemented in C — after all, you can implement a virtual machine with C that is completely unlike its host environment, meaning that you can get away from a C / near-assembly language mindset.
However, it's more difficult to claim that the C language itself didn't have any influence on the design of later languages. Take for example the usage of curly braces { } to group statements into blocks, the notion that whitespace and indentation is mostly unimportant, native type's names (int, char, etc.) and other keywords, or the way how variables are defined (ie. type declaration first, followed by the variable's name, optional initialization). Many of today's popular and wide-spread languages (C++, Java, C#, and I'm sure there are even more) share these concepts with C. (These probably weren't completely new with C, but AFAIK C came up with that particular mix of language syntax.)
Even with a C implementation, you're surprisingly free in terms of implementation. For example, chicken scheme uses C as an intermediate, but still manages to use the stack as a nursery generation in its garbage collector.
That said, there are some cases where there are constraints. Case in point: The GHC haskell compiler has a perl script called the Evil Mangler to alter the GCC-outputted assembly code to implement some important optimizations. They've been moving to internally-generated assembly and LLVM partially for that reason. That said, this hasn't constrained the language design - only the compiler's choice of available optimizations.
No, in short. The reality is, look around at the languages that are written in C. Lua, for example, is about as far from C as you can get without becoming Perl. It has first-class functions, fully automated memory management, etc.
It's unusual for new languages to be affected by their implementation language, unless said language contains serious limitations. While I definitely disapprove of C, it's not a limited language, just very error-prone and slow to program in compared to more modern languages. Oh, except in the CRT. For example, Lua doesn't contain directory functionality, because it's not part of the CRT so they can't portably implement it in standard C. That is one way in which C is limited. But in terms of language features, it's not limited.
If you wanted to construct an argument saying that languages implemented in C have XYZ limitations or characteristics, you would have to show that doing things another way is impossible in C.
The C stack is just the system stack, and this concept predates C by quite a bit. If you study theory of computing you will see that using a stack is very powerful.
Using C to implement languages has probably had very little effect on those languages, though the familiarity with C (and other C like languages) of people who design and implement languages has probably influenced their design a great deal. It is very difficult to not be influenced by things you've seen before even when you aren't actively copying the best bits of another language.
Many languages do use C as the glue between them and other things, though. Part of this is that many OSes provide a C API, so to access that it's easy to use C. Additionally, C is just so common and simple that many other languages have some sort of way to interface with it. If you want to glue two modules together which are written in different languages then using C as the middle man is probably the easiest solution.
Where implementing a language in C has probably influenced other languages the most is probably things like how escapes are done in strings, which probably isn't that limiting.
The only thing that has constrained language design is the imagination and technical skill of the language designers. As you said, C can be thought of as a "portable assembly language". If that is true, then asking if C has constrained a design is akin to asking if assembly has constrained language design. Since all code written in any language is eventually executed as assembly, every language would suffer the same constraints. Therefore, the C language itself imposes no constraints that would be overcome by using a different language.
That being said, there are some things that are easier to do in one language vs another. Many language designers take this into account. If the language is being designed to be, say, powerful at string processing but performance is not a concern, then using a language with better built-in string processing facilities (such as C++) might be more optimal.
Many developers choose C for several reasons. First, C is a very common language. Open source projects in particular like that it is relatively easier to find an experienced C-language developer than it is to find an equivalently-skilled developer in some other languages. Second, C typically lends itself to micro-optimization. When writing a parser for a scripted language, the efficiency of the parser has a big impact on the overall performance of scripts written in that language. For compiled languages, a more efficient compiler can reduce compile times. Many C compilers are very good at generating extremely optimized code (which is also part of the reason why many embedded systems are programmed in C), and performance-critical code can be written in inline assembly. Also, C is standardized and is generally a static target. Code can be written to the ANSI/C89 standard and not have to worry about it being incompatible with a future version of C. The revisions made in the C99 standard add functionality but don't break existing code. Finally, C is extremely portable. If at least one compiler exists for a given platform, it's most likely a C compiler. Using a highly-portable language like C makes it easier to maximize the number of platforms that can use the new language.
The one limitation that comes to mind is extensibility and compiler hosting. Consider the case of C#. The compiler is written in C/C++ and is entirely native code. This makes it very difficult to use in process with a C# application.
This has broad implications for the tooling chain of C#. Any code which wants to take advantage of the real C# parser or binding engine has to have at least one component which is written in native code. This eventually results in most of the tooling chain for the C# language being written in C++ which is a bit backwards for a language.
This doesn't limit the language per say but definitely has an effect on the experience around the language.
Garbage collection. Language implementations on top of Java or .NET use the VM's GC. Those on top of C tend to use reference counting.
One thing I can think of is that functions are not necessarily first class members in the language, and this is can't be blamed on C alone (I am not talking about passing a function pointer, though it can be argued that C provides you with that feature).
If one were to write a DSL in groovy (/scheme/lisp/haskell/lua/javascript/and some more that I am not sure of), functions can become first class members. Making functions first class members and allowing for anonymous functions allows to write concise and more human readable code (like demonstrated by LINQ).
Yes, eventually all of these are running under C (or assembly if you want to get to that level), but in terms of providing the user of the language the ability to express themselves better, these abstractions do a wonderful job.
Implementing a compiler/interpreter in C doesn't have any major limitations. On the other hand, implementing a language X to C compiler does. For example, according to the Wikipedia article on C--, when compiling a higher level language to C you can't do precise garbage collection, efficient exception handling, or tail recursion optimization. This is the kind of problem that C-- was intended to solve.

C compiler's language [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I just want to know the language in which the C compiler was written. Please say something other than C.
Here's an excellent read: Reflections on Trusting Trust by Ken Thompson. Starts off with an overview of how the first C compilers were written. The boot-strapping technique to be precise. May not answer your question directly but gives you some insight.
Nearly all major C compilers are written in C. You might think there's a chicken-and-egg problem with this, but there's not. The process is called bootstrapping.
The very original C compiler was written (by K&R) in a predecessor language called B, or maybe BCPL. But once the C compiler was working well enough, they converted over to C and began using each successive version to compile the next.
Many of the bizarre features of C such as pre- and post-increment operators exist because (a) they represented special addressing modes on the PDP-11 on which the first C was developed, or (b) they helped the compiler fit in memory while compiling its own next version.
So that's the rest of the story.
GCC is written in C. The majority of C compilers are written in C.
There is a boot-strapping phase when first producing a compiler for a language (any language that has pretensions to be able to compile its own compiler - COBOL is one plausible exception, but there are many others) on a given platform, but once you have a compiler, then you write the compiler in that language.
All else apart, doing it in assembler is too expensive.
Depending on which C compiler, it was likely written in assembly, then it eventually probably became self-compiling so then parts were written in C.
You may browse the source for GCC for yourself at http://gcc.gnu.org/viewcvs/branches/
gcc is written in C
Clang is written in C++.
Those are the two I know.
You have to specify which compiler.
In the old days, people would write a small subset of the C language in assembler, and then use that to "bootstrap" compile a better C compiler written in C. These days it's more common to make a C compiler for a new architecture by cross compiling from an architecture that already works. I believe there are very few bits of, for instance, the gcc compiler, that aren't written in C or C++.
Seems to me it would be easiest to write a compiler in perl

Resources