I was looking at the source code for gcc (out of curiosity), and I noticed a data structure that I've never seen in C before.
At line 80 and 129 (and many other places) in the parser, they seem to be using vectors.
80: vec<tree> incomplete_record_decls;
129: ridpointers = ggc_cleared_vec_alloc<tree> ((int) RID_MAX);
I've never encountered this data type in C, nor these: < >. Are they native to C?
Does anyone know what they are and how they are used?
Despite the .c filename, this code is not valid C; it is C++, using that language's template feature. If you inspect the gcc build process, you will find that this file is actually compiled with a C++ compiler.
https://gcc.gnu.org/codingconventions.html
The directories gcc, libcpp and fixincludes may use C++03. They may also use the long long type if the host C++ compiler supports it. These directories should use reasonably portable parts of C++03, so that it is possible to build GCC with C++ compilers other than GCC itself. If testing reveals that reasonably recent versions of non-GCC C++ compilers cannot compile GCC, then GCC code should be adjusted accordingly. (Avoiding unusual language constructs helps immensely.) Furthermore, these directories should also be compatible with C++11.
Keep in mind that although compilers will usually by default infer a source file's language from its filename, this default can always be overridden. It is entirely possible to have C++ code in a .c file, or C code in a .bas file for that matter; you just may have to tell the compiler some other way what language is in use.
I expect that gcc chose this file naming convention because this code was originally written in C and later converted to C++, and they found it too much of a pain to change all the filenames. It would mean a lot of work to update all the makefiles, etc. It may have been less of a pain to just change which compiler was used, and to explain the convention to all the developers. Of course, in general it is better programming practice to name your files in the standard way, but apparently the gcc developers felt it was not the best course of action in this case.
GCC has moved from C to C++ since GCC 4.8
GCC now uses C++ as its implementation language. This means that to build GCC from sources, you will need a C++ compiler that understands C++ 2003. For more details on the rationale and specific changes, please refer to the C++ conversion page.
GCC 4.8 Release Series - Changes, New Features, and Fixes
The work has actually begun long before that, with the creation of gcc-in-cxx branch. The developers first tried to compile the source code with a C++ compiler, so there weren't any name changes. I guess they didn't bother to rename the files later when merging the two branches and officially have only one C++ branch
You can read GCC's move to C++ for more historical information
Related
I am trying to compile a C source code to a machine code using an ubunto terminal
My tutor instruction was to use the following command:
running clang myprogramm.c -std=c11
Why shall I use the keyword -std=c11 and what is the difference to using just
clang myprogramm.c
Using std= options is required by your tutor (I'm divinig her motives, I'm particularly good at this!) because she wants to make sure you stay away from all those nifty Clang features that turn the accepted language from C to A LANGUAGE SUPERFICIALLY LOOKING LIKE C BUT ACTUALLY A DIFFERENT LANGUAGE NOT SUPPORTED BY OTHER C COMPILERS.
That is more than just additional library functions. It include syntax changes that break the grammar of Standard C, as defined by ISO. A grasshopper should not use these while learning. Using -std=c11 makes sure Clang either warns about or even rejects, with an error, such constructs.
When to specify the standard? Whenever you use the compiler. It is never a good idea to let the compiler just use whatever it wants.
If someone tries to use a compiler that is too old, then they will get a warning or error, and they will understand why the compile fails.
If a code contributor (maybe even yourself!) tries to add code using features that are too new, their code will be rejected. That's very important if you intend to keep compatibility with an older standard.
By explicitly stating the standard, using new features or extensions are a choice and don't happen by accident.
I studied Option Summary for gfortran but found no compiler option to detect integer overflow. Then I found the GCC (GNU Compiler Collection) flag option -fsanitize=signed-integer-overflow here and used it when invoking gfortran. It works--integer overflow can be detected at run time!
So what does -fsanitize=signed-integer-overflow do here? Just adding to the machine code generated by gfortran some machine-level pieces that check integer overflow?
What is the relation between GCC (GNU Compiler Collection) flag options and gfortran compiler options ? What gcc compiler options can I use for gfortran, g++ etc ?
There is the GCC - GNU Compiler Collection. It shares the common backend and middleend and has frontends for different languages. For example frontends for C, C++ and Fortran which are usually invoked by commands gcc, g++ and gfortran.
It is actually more complicated, you can call gcc on a Fortran source and gfortran on a C source and it will work almost the same with the exceptions of libraries being linked (there are some other fine points). The appropriate frontend will be called based on the file extension or the language requested.
You can look almost all GCC (not just gcc) flags for all of the mentioned frontends. There are certain flags which are language specific. Normally you will get a warning like
gfortran -fcheck=all source.c
cc1: warning: command line option ‘-fcheck=all’ is valid for Fortran but not for C
but the file will compile fine, the option is just ignored and you will get a warning about that. Notice it is a C file and it is compiled by the gfortran command just fine.
The sanitization options are AFAIK not that language specific and work for multiple languages implemented in GCC, maybe with some exceptions for some obviously language specific checks. Especially -fsanitize=signed-integer-overflow which you ask about works perfectly fine for both C and C++. Signed integer overwlow is undefined behaviour in C and C++ and it is not allowed by the Fortran standard (which effectively means the same, Fortran just uses different words).
This isn't a terribly precise answer to your question, but an aha! moment, when learning about compilers, is learning that gcc (the GNU Compiler Collection), like llvm, is an example of a three-stage compiler.
The ‘front end’ parses the syntax of whichever language you're interested, and spits out an Abstract Syntax Tree (AST), which represents your program in a language-independent way.
Then the ‘middle end’ (terrible name, but ‘the clever bit’) reorganises that AST into another AST which is semantically equivalent but easier to turn into machine code.
Then the ‘back end’ turns that reorganised AST into assembler for one-or-other processor, possibly doing platform-specific micro-optimisations along the way.
That's why the (huge number of) gcc/llvm options are unexpectedly common to (apparently wildly) different languages. A few of the options are specific to C, or Fortran, or Objective-C, or whatever, but the majority of them (probably) are concerned with the middle and last bits, and so are common to all of the languages that gcc/llvm supports.
Thus the various options are specific to stage 1, 2 or 3, but may not be conveniently labelled as such; with this in mind, however, you might reasonably intuit what is and isn't relevant to the particular language you're interested in.
(It's for this sort of reason that I will dogmatically claim that CC++FortranJavaPerlPython is essentially a single language, with only trivial syntactical and library minutiae to distinguish between dialects).
I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?
Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.
The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.
The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.
A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.
Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.
Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.
As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.
Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.
Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.
It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.
Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...
Overview:
I'm working with a hobby app. I want my program to be able to stick to "plain C".
For several reasons, I have to use a C++ compiler, and the related programming enviroment program, that supports "Plain C". And, for the same reasons, I cannot change to antoher compiler.
And, there are some C++ features that I have been coded, unintentionally.
For example, I'm not using namespaces or classes. My current programming job, is not "plain c" or "c++", and I haven't used them for some time, so, I may have forgotten which stuff is "plain c" only.
I have browsed in the internet, for "Plain C" examples. I have found that many other developers, have also post mixed "plain c" & "c++" examples, (some of them unintentionally).
I'm using some dynamically allocated structures. I have been using "malloc", but I rather use "new" instead. I thought that some new standard & compiler versions of "plain c" allowed "new", but, seems I'm wrong.
Seems that "new" is a "C++" feature, & if I really want to make a only "plain c", I should use "malloc".
The reason I want to stick to "plain C", it's because I'm working in a cross platform non-gui library / tool.
My current platform is "Windowze", my Development Enviroments, are:
(1) CodeBlocks (MinGW)
(2) Bloodshed DevCPP
(3) Borland CBuilder 6
Although, my goal is to migrate it to Linux, too , and maybe other platforms, and other (command-line) compilers.
Quick not Tested Example:
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
struct MyData_T
{
int MyInt;
char MyName[512];
char *MyCharPtr;
};
typedef
struct MyData_T *MyData_P;
MyData_P newData(char* AName)
{
MyData_P Result = null;
Result = malloc(sizeof(MyData_T));
strcpy(Result->MyName, AName, strlen(AName));
// do other stuff with fields
return Result;
} // MyData_P newData(...)
int main(...)
{
int ErrorCode = 0;
MyData_P MyDataVar = newData("John Doe");
// do more stuff with "MyDataVar";
free(MyDataVar);
return ErrorCode;
} // int main(...)
Questions
Where I can get a working "plain c only" compiler for x86 (windowze, linux) ?
Should I stick to use "malloc", "calloc", and similar ?
Should I consider to change to "C++" & "new", instead ?
Is it valid to use "new" & "delete" in a "plain c" application ?
Any other suggestion ?
Thanks.
Disclaimer
Note: I already spent several hours trying not to post the same question, in Stackoverflow, but, none of the previous answers seem clear to me.
Remember that C and C++ are actually completely different languages. They share some common syntax, but C is a procedural language and C++ is object oriented, so they are different programming paradigms.
gcc should work just fine as a C compiler. I believe MinGW uses it. It also has flags you can specify to make sure it's using the right version of C (e.g. C99).
If you want to stick with C then you simply won't be able to use new (it's not part of the C language) but there shouldn't be any problems with moving to C++ for a shared library, just so long as you put your Object Oriented hat on when you do.
I'd suggest you just stick with the language you are more comfortable with. The fact that you're using new suggests that will be C++, but it's up to you.
You can use e.g. GCC as a C compiler. To ensure it's compiling as C, use the -x c option. You can also specify a particular version of the C standard, e.g. -std=c99. To ensure you're not using any GCC-specific extensions, you can use the -pedantic flag. I'm sure other compilers have similar options.
malloc and calloc are indeed how you allocate memory in C.
That's up to you.* You say that you want to be cross-platform, but C++ is essentially just as "cross-platform" as C. However, if you're working on embedded platforms (e.g. microcontrollers or DSPs), you may not find C++ compilers for them.
No, new and delete are not supported in C.
* In my opinion, though, you should strongly consider switching to C++ for any application of non-trivial complexity. C++ has far more powerful high-level constructs than C (e.g. smart pointers, containers, templates) that simplify a lot of the tedious work in C. It takes a while to learn how to use them effectively, but in the long run, they will be worth it.
GCC has a C compiler. It's the basic one. You can call it with gcc -std=c90 to make sure it doesn't slip in any Gnu or C++ extensions.
Yes, malloc/calloc are portable and safe for use in C
Only if you have some reason to switch to C++… C is a bit more portable, but not by much, these days.
The most important tip is to save your file with a .c extension and disable compiler extensions. On both Visual C++ and gcc (and thus MinGW) this makes them go into C mode, where C++ additions will be disabled.
You can also force C mode using -std=c90 (or c99, depending on the C standard you want to use; these also disable GNU extensions) in gcc, /Tc in VC++ (and here to disable MS extensions you have to use /Za).
If using visual studio, just make the file .c (though its not strictly a C compiler, it pretends to be, and for the most, is good enough)
*nix world you can use gcc, as its pretty much the standard.
If you want to do C stick to C, if you want to do C++, use C++
so stick with malloc etc.... in C++ you'd use smart pointers.
You don't have to stick with C to create cross platform non-GUI library. You can as well develop that in C++. Since it is hobby, it is OK, but there are such libraries already available.
It seems that starting with C17 (or the technical name ISO/IEC 9899:2018) the new keyword is being supported. To make this work, if you're using Visual Studio Code, in the c_cpp_properties.json file update:
"cStandard": "c17", and
"cppStandard": "c++17"
Is there a way to convert a Delphi .dcu file to an .obj file so that it can be linked using a compiler like GCC? I've not used Delphi for a couple of years but would like to use if for a project again if this is possible.
Delphi can output .obj files, but they are in a 32-bit variant of Intel OMF. GCC, on the other hand, works with ELF (Linux, most Unixes), COFF (on Windows) or Mach-O (Mac).
But that alone is not enough. It's hard to write much code without using the runtime library, and the implementation of the runtime library will be dependent on low-level details of the compiler and linker architecture, for things like correct order of initialization.
Moreover, there's more to compatibility than just the object file format; code on Linux, in particular, needs to be position-independent, which means it can't use absolute values to reference global symbols, but rather must index all its global data from a register or relative to the instruction pointer, so that the code can be relocated in memory without rewriting references.
DCU files are a serialization of the Delphi symbol tables and code generated for each proc, and are thus highly dependent on the implementation details of the compiler, which changes from one version to the next.
All this is to say that it's unlikely that you'd be able to get much Delphi (dcc32) code linking into a GNU environment, unless you restricted yourself to the absolute minimum of non-managed data types (no strings, no interfaces) and procedural code (no classes, no initialization section, no data that needs initialization, etc.)
(answer to various FPC remarks, but I need more room)
For a good understanding, you have to know that a delphi .dcu translates to two differernt FPC files, .ppu file with the mentioned symtable stuff, which includes non linkable code like inline functions and generic definitions and a .o which is mingw compatible (COFF) on Windows. Cygwin is mingw compatible too on linking level (but runtime is different and scary). Anyway, mingw32/64 is our reference gcc on Windows.
The PPU has a similar version problem as Delphi's DCU, probably for the same reasons. The ppu format is different nearly every major release. (so 2.0, 2.2, 2.4), and changes typically 2-3 times an year in the trunk
So while FPC on Windows uses own assemblers and linkers, the .o's it generates are still compatible with mingw32 In general FPC's output is very gcc compatible, and it is often possible to link in gcc static libs directly, allowing e.g. mysql and postgres linklibs to be linked into apps with a suitable license. (like e.g. GPL) On 64-bit they should be compatible too, but this is probably less tested than win32.
The textmode IDE even links in the entire GDB debugger in library form. GDB is one of the main reasons for gcc compatibility on Windows.
While Barry's points about the runtime in general hold for FPC too, it might be slightly easier to work around this. It might only require calling certain functions to initialize the FPC rtl from your startup code, and similarly for the finalize. Compile a minimal FPC program with -al and see the resulting assembler (in the .s file, most notably initializeunits and finalizeunits) Moreover the RTL is more flexible and probably more easily cut down to a minimum.
Of course as soon as you also require exceptions to work across gcc<->fpc bounderies you are out of luck. FPC does not use SEH, or any scheme compatible with anything else ATM. (contrary to Delphi, which uses SEH, which at least in theory should give you an advantage there, Barry?) OTOH, gcc might use its own libunwind instead of SEH.
Note that the default calling convention of FPC on x86 is Delphi compatible register, so you might need to insert proper cdecl (which should be gcc compatible) modifiers, or even can set it for entire units at a time using {$calling cdecl}
On *nix this is bog standard (e.g. apache modules), I don't know many people that do this on win32 though.
About compatibility; FPC can compile packages like Indy, Teechart, Zeos, ICS, Synapse, VST
and reams more with little or no mods. The dialect levels of released versions are a mix of D7 and up, with the focus on D7. The dialect level is slowly creeping to D2006 level in trunk versions. (with for in, class abstract etc)
Yes. Have a look at the Project Options dialog box:
(High-Res)
As far as I am aware, Delphi only supports the OMF object file format. You may want to try an object format converter such as Agner Fog's.
Since the DCU format is proprietary and has a tendency of changing from one version of Delphi to the next, there's probably no reliable way to convert a DCU to an OBJ. Your best bet is to build them in OBJ format in the first place, as per Andreas's answer.