Learning gcc internals [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've recently been very interested in compilers and how they work. Since gcc has it's source available, I figured it would be the best material to study.
The first thing I realized is that it would be pointless to study gcc if I didn't have a basic understand of simple compiler design principles. I have since been diligently reading the "Dragon Book" which, from what I have seen, is the de facto book on compiler implementation.
None-the-less, reading that book has only furthered my desire to learn about compilers such as gcc.
Additionally, I find it pertinent to say that I do have a intermediate understanding of c/c++ (aka, I'm not trying to study gcc without knowing c). I am hoping that studying gcc will help me improve upon that as well.
I have downloaded the latest build I could find; however, I get lost when perusing the source code.
What I'm looking for are suggestions on how to proceed. Is there a similar project, which is not so massive, I could use as a stepping stone to gcc? Is there a particular module of gcc which one would recommend studying first? Are there any books which go into gcc's implementation, rather than it's use? Perhaps I should stop whining and just keep reading the source until it clicks?
Any and all feedback will be greatly appreciated.
EDIT: If you think I should study a different compiler/interpreter, I would greatly appreciate suggestions as to which ones.

If you want to look at a very tiny compiler, I would recommend Fabrice Bellard's Tiny C Compiler.
Also worth mentioning, Fabrice Bellard won the obfuscated c code contest with his Obfuscated Tiny C Compiler. There's a deobfuscated version as well, and it fits in a single c file.
These should be great if you want something small and manageable to learn from.

I would definitely look at clang/LLVM. I think the code base is very readable. One very viable option you'd have is to use LLVM as a back end and write your own simple lexer and parser.

I think it's good to read the book "ruby under a microscope" and practise with ruby core development, before reading gcc's code. But you should need knowledge on ruby programming. It's about ruby internels.
As I know the best book on gcc is "the definitive guide to gcc" https://www.amazon.com/Definitive-Guide-GCC-Guides-Paperback/dp/1590595858. Although it is little bit old, I think you should read this.

Passionate about compilers too, I learned a lot from Niklaus Wirth's book Algorithms + Data Structures = Programs. One of the last chapters described the Pascal-0 languages, and the previous chapters show how to parse and compile a very minimalistic language. Pascal-0, PL/0 are two-step compilers, they generate p-code, which is 'machine code' for a minimalistic virtual machine (not unlike Java).
This page describes a PL/0 virtual machine instruction set, and, at the very end, links to a PL/0 compiler and other interesting info.
Niklaus Wirth has always had a knack for writing readable and well-structured code. Here's the language definition and many other interesting links.
The advantage of studying and using Pascal, is that the language is very structured, and not an evolution from Assembler (like C). It makes compiling much easier. It's not even necessary to do several passes...

Related

what compiler should I use as case study for self studying compiler principles techniques [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I decided to start studying compiler theory but the problem is that I want a compiler for any language in order to track each of
lexical analyzer output.
syntax tree.
intermediate representation.
code generation.
I dont care for optimization right now
I am aware of some questions similar to mine about clang and gcc and I understand that both of them make lexical and syntax analysing on the fly
I just want any compiler in any language as long as the compiler itself is written in C and run on ubuntu x64
I am not sure you have the right approach, if you are willing to learn about compilation techniques for C specifically. And C is not the best language to write a compiler in (if you start from scratch, Ocaml is better suited for that task). BTW, recent Clang/LLVM or GCC are coded in C++ (no more in C).
The C language now sort-of requires optimization, as I explained here, so skipping the optimization part is not very useful. Be aware that optimization passes form the majority and the most difficult part of real-world compilers.
The lexing and parsing parts of compiler are now well understood. And there are several code generator tools for them (yacc or bison, lex or flex, ANTLR...). For several pragmatical reasons, real compilers like GCC don't use these tools.
You could look into tinycc, nwcc, or 8cc if you want to look inside non-optimizing toy C compilers.
You could also look into the intermediate representations of real compiler, e.g. GIMPLE for GCC (BTW, try to compile with gcc -fdump-tree-all -O2 -c some simple C code with a few loops; you'll be surprized by the hundreds of dump files showing the many internal compiler representations from many passes). You'll learn a lot by customizing GCC with MELT, and the MELT documentation page contains several very useful references. This answer should also help and contains or references some pictures of GCC.
disclaimer: I am the main author of MELT
PS. There are very good reasons to bootstrap compilers. So a compiler for a language other than C is unlikely to be coded in C (it is often coded for that language itself), since C is not a good programming language to write a compiler from scratch.
PPS. If you only know C -and no other programming languages-, I would suggest to learn some other programming language (e.g. Scheme with SICP, Ocaml, or Haskell or Scala or Clojure or Common Lisp) before diving into compilers! Read also something about Programming Language Pragmatics. If you know a bit of Scheme or Lisp, Queinnec's book Lisp In Small Pieces will teach you a big lot.
There are many, many places to start from to explore this territory. Many languages include a compilation capability or aspect such as Lisp and Forth.
To learn about a C compiler, there is a book about the LCC compiler which includes the source code for the compiler. There are also repositories of old C compilers at The Unix History Society archive (tuhs.org).
Still another angle you could take is to examine the language False (an ancestor of the more famous Brainfuck) which is designed to be implemented with very little code.
Another angle, which connects to your interest in complexity theory, is to learn about the Chomsky Hierarchy of languages and the associated abstract machines which can parse them. This will teach you why Lex and Yacc are separate tools and what each is good for (and how to do it yourself and not need them at all).
I am actually on the very same quest myself. I'm currently reading the old 1979 book Anatomy of Lisp which contains compiler code in, of course, Lisp. But this is ok, because I already have my own homebrewed lisp interpreter to execute it with.
The Tiger language has been designed by prof. Andrew Appel exactly on purpose to illustrate, step-by-step, a full compiler construct process.
You can google for 'tiger language' and read some online resource, there are also some questions/answers here on SO, but the better choice would be to get a copy of the book for the language you prefer, and implement the parts you're most interested into.

Why everything low-level is written in C? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why everything low-level is written in C?
I mean kernel code, drivers, Run-time libraries, compilers, embedded systems s/w are mainly written in C/C++.
Why not use Fortran, COBOL, Pascal or even Java/C# or some other third-generation programming language that produces machine-independent code but also gives you the freedom to do low-level ,tweaks and optimizations.
My question is why developers and companies ended up using mostly C for these purposes.
Edit: Most of you here talk about performance. So, is the reason that there is no other general-purpose low-level language faster than C?
A few points:
Pascal is not low level language but there are kernels and even whole OS written in it.
You would not want to have a OS written in Java\C# because it will be darn slow.
C is probably not the best language. It has many cavities, and improvements like D or C++ have been tried. The only "problem" is inertia, C is still popular because C is the most widely used programming language (weather you like it or not). There is a plethora of kernels\OS\libraries\books\course with this language. It would take decades to replace it. And it seems that despite its cavities, there is very little will to completely replace it.
Java (and all JVM-based languages) and C#/F# run inside "virtual machines". That means the applications written in these languages cannot use hardware resourses directly, they are contained weithin a "sandbox". It helps portability ("runs everywhere where a VM is implemented") but can hurt performance (and does).
Some would say that the type of mind capable of writing low-level stuff can only be forged by years of damage caused by using C :-)
On a more serious note, the whole purpose of C was as a systems programming language and, as such, it mostly keeps out of your way. Other languages have different purposes: COBOL is really for transactional/business stuff, C# is for applications running under MS Windows, LISP is for people who have love counting parentheses, and so on. They can be used for other things but I wouldn't write an operating system in COBOL.
Or an accounting package in assembler.
Or anything in Pascal :-)
C allows you unfettered access to the lowest levels without having to concern yourself with things like garbage collection which may adversely affect your code in ways you can't foresee.
Because, comparatively, C and C++ are low level programming languages. Some people still write in Assembler. I hope no one still writes in machine code. Anyway,
Why not use Java, C#, COBOL, Pascal or some other third-generation programming language that produces machine-independent code but also gives you the freedom to do low-level tweaks and optimizations?
Those languages are classified as high level languages. They provide a level of machine abstraction that is beneficial for programming, but not useful for low level bare metal development. Also, relevant might be Why Pascal is Not My Favorite Programming Language by BWK.

How "Hello World" works in C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am searching for a month for blog posts about "how C program works". Most of them go like
Compilers do these things
Linkers do these things
Program is put into memory; and stacks, heaps, etc.
I thought I would first read about how compiler works to understand the flow of the program into the machine. Dragon Book seems to be universally preferred. But truthfully, it's too intensive. I am not good enough now to go through it all.
So I began to read about hardware. But There too, they explain about buses, I/O signals, structure of memory, writing cache-friendly code etc. But with no proper examples.
But still I could not find myself satisfied or being able to completely visualize the process.
2 hours ago I decided to ask this question.(Since I am scared of It might be not useful to SO community, or off-topic question or other down-votable categories) and I did not find out any post relating to this exactly. There was one about "how compiler does the compilation", but the answers showed that it's too broad a question.
My question is this:
I would like to know how, in depth, a C program works. If you cannot tell me explicitly, please redirect me to a book or another post on another website that can give me the answer to this.
I am here until I get a response. If you have any suggestions regarding this post, tell me. And It is not my first language, so please take all my sentences as being soft and polite.
Thanks.
UPDATE:
Along with the accepted answer, there are some very nice links as well as suggestions which give partial answers or the way to proceed further to understanding what I am trying to understand.
The best answer to this question by far comes from the book "The Elements of Computing Systems," by Noam Nisan and Shimon Schocken. This book starts from the simplest possible electronic components, assembles them into a working processor, invents a simple assembly language for it, writes an assembler for that, and ultimately shows you how high-level languages can be compiled onto it. Reading the book, and working all the examples (which use a simulator for the hardware, so no workshop required!), will forever change the way you look at computers; he will literally understand everything from the lowest to the highest levels, and see how they work together. See the book's website for more info.
It's too broad a question (as you have observed).
If you really want to understand from bottom up - buy an OLD computer from the 80's off ebay. Sinclair Spectrum/BBC it really doesn't matter but make sure you get plenty of books and manuals that go with it.
You will learn plenty because these machines were well documented and what wasn't documented was discovered and then documented :)
They are also sooooo much simpler than a modern quad core multi-gigabyte memory job. It will all fit inside your head easily
or for a modern start, Arduino or Raspberry pie maybe.

Modern Ada to C/C++ translator [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there any source-to-source converter (translator) from Ada (95, 2005) to C?
How full they are (can they convert every feature of Ada into gnu c99 + pthreads + POSIX)?
Is it possible to use such ada-to-c translator in critical applications?
PS: Translators to C++ (up to 2003 with gnu extensions) are welcome too.
PPS: when said "gnu c99", it means only that C99 + most gnu extensions are supported, but don't mean the GCC.
I don't know of any open source Ada-to-C translator. The only one I knew of at all was SofCheck's, which was reportedly pretty good.
SofCheck has since been bought by AdaCore, and I did a very brief search of the AdaCore website for the translator, and nothing jumped out. You could ask them at sales#adacore.com, if pursuing a commercial solution is a viable option for you. (At least get a price.)
Unless there is an incredibly strong reason to use Ada for this application (e.g., customer demands it, or you already have a big application coded in Ada that you want to use), it will likely be a lot less painful if you just bite the bullet and code your solution in well-crafted C99 or C++ as you see fit.
If you insist, Sofcheck's translator might be best; they've been working on it a long time.
Failing that, you might(?) build a translator starting with the ASIS output of an Ada compiler. That's likely rather a lot of persnickety work since Ada has pretty precise semantics that you'd better preserve if you want to just carelessly code in Ada, translate and run. It will be even more work if you want the output to be "pretty" for the final customer. (Long term maintenance should be a consideration). I suspect implementing code to simulate Ada's rendezvous might be rather tricky, being both semantically complicated and asynchronous at the same time. The real flaw with this approach is that it is a lot of work; maybe just getting on with your life and coding the application itself in something non-Ada would be less effort.
See my caveats on language translation done poorly and alternative methods.

In what language was MSDOS originally written? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
In what language was MSDOS originally written in?
The Wikipedia Article implies either C, QBasic or Pascal, but:
C was invented to write UNIX, so I don't believe it was used to write MSDOS
Pascal seems popular to teach programming, but not really popular to write Operating systems in
QBasic didn't seem to be very popular for Operating Systems at the time MSDOS was developed (or was *BASIC ever very popular to write Operating Systems in it?)
Except these three languages there is also Assembly, but I assume that Microsoft already switched from Assembly to a "higher" level language?
Since C was originally invented for UNIX, I still wouldn't think Microsoft is using C... although the Microsoft API is written in C (I find this kind-of oxymoronic, actually).
Can anyone enlighten me on this topic?
http://answers.google.com/answers/threadview?id=197874
Since CP/M was written in FORTRAN and
QDOS was based on CP/M, does it mean
that QDOS and MS-DOS were written in
FORTRAN? According to our next
article, written by Tim Patterson
himself, the assembly language used by
Seattle Computer Products wasn't
FORTRAN but was built in-house since
it was the only thing available to
them at that time.
"The last design requirement was that
MS-DOS be written in assembly
language. While this characteristic
does help meet the need for speed and
efficiency, the reason for including
it is much more basic. The only 8086
software-development tools available
to Seattle Computer at that time were
an assembler that ran on the Z80 under
CP/M and a monitor/debugger that fit
into a 2K-byte EPROM (erasable
programmable read-only memory). Both
of these tools had been developed in
house."
"An Inside Look at MS-DOS"
http://www.patersontech.com/Dos/Byte/InsideDos.htm
Well, MS-DOS was originally a renamed 86-DOS, and 86-DOS was written in assembly if I'm not mistaken, so that would make ASM the original language for MS-DOS as well.
As stated on http://www.patersontech.com/Dos/Byte/InsideDos.htm
"The last design requirement was that MS-DOS be written in assembly language."
(Note that alot of appllications, not just operating system parts, were written in assembly back then.)
See the timeline
Assembler source of 86DOS
Documentation
Unix pre-dates MS-DOS, so that's not an impediment for it to be programmed en C. But I'd go for the assembly for most parts at least...
If you look for MS-DOS on some websites, you can find the version 6 with the source code included. It was written in Assembler and there's no C code at all. All the utilities, kernel, and even installer was written in assembler.
And regarding Windows, it has a lot of assembly language on it but some parts where writting in C and then C++.

Resources