Where in the GCC source code does it compile to the different assembly languages?

Where in the GCC source code does it compile to the different assembly languages? - c

Where is the code in the GCC source code that actually constructs the assembly for the different architectures?
Wondering how many different assembly languages it compiles to, and how it actually does this (by taking a look at the source code).
Is it in the gcc repo somewhere, or in another repo? I have started to dig around but haven't found anything.
https://github.com/gcc-mirror/gcc
For example, here is some of the assembly generating code in V8:
https://github.com/v8/v8-git-mirror/tree/master/src/x64
Is there anything equivalent for GCC?
I am wondering because it's a mystery how GCC does this, and it would be a great way to learn how compilers are actually implemented down to the assembly level.

The .md (machine description) files of GCC source contain stuff to generate assembly. GCC contains several specialized C/C++ code generators (and some of them translates the .md files into code emitting assembly).
GCC is a very complex program. The documentation of GCC MELT (an obsolete project) contains several interesting links and slides, notably refering to the Indian GCC Resource Center
Most of the optimizations in GCC happens in the middle-end (which is mostly independent of source language or target system), notably with many passes working on the Gimple representations.
The GCC repo is an SVN repository.
See also this answer, notably the pictures inside it.

The actual source code for GCC is most accessible from here:
https://gcc.gnu.org/svn.html
The software is accessible via SVN (subversion), a source code control system. This would be installed on many versions of Linux/UNIX, but if not on your platform, you can install the svn kit and then fetch the source using the following command:
svn checkout svn://gcc.gnu.org/svn/gcc/trunk SomeLocalDir
GCC is complex and would take significant experience to understand the nature of how the application actually compiles to different architectures.
In a nutshell, GCC has three major components - front-end, middle and back-end processing. The front-end processor has the component of the language parsing to understand the syntax of languages (like C, C++, Objective-C, etc). The front-end deconstructs the code to a portable construct which is then passed to the back-end for compilation to the target environment.
The middle part performs code analysis and optimisation, attempting to prioritise the code to generate the best possible output at the end of the full process. Technically, optimisation can occur at any part of the process as patterns are discovered during analysis.
The back-end processor compiles the code to a tree-style output format (not actually final executable code). Based on what the expected output is designed to be, the "pseudo-code" is optimised for using registers, bit-sizes, endian-ness, and so on. The final code is then generated during the assembly phase, which converts the back-end code into machine executable instructions.
It's important to note that the compiler has many options to deal with output formats so you can create output to many classes of architecture, usually out of the box. For cross-compiling and target compiler options, try checking out this link:
https://gcc.gnu.org/install/configure.html

Related

Why specify the target architecture to the linker?

I've been working on using the Meson build system for an embedded project. Since I'm working on an embedded platform, I've written a custom linker script and also an invocation for the linker. I didn't have any problems until I tried to link in newlib to my project, when I started to have link issues. Just before I got it working, the last error was undefined reference to main which I knew was clearly in the project.
Out of happenstance, I tried adding -mcpu=cortex-m4 to my linker invocation (I am using gcc to link, I am told this is quite typical instead of directly calling ld). It worked! Now, my only question is "why"?
Perhaps I am missing something about how the linking process actually works, but considering I am just producing an ELF file, I didn't think it would be important to specify the CPU architecture to the linker. Is this a newlib thing, or has gcc just been doing magic behind the scenes for me that I haven't seen before?
For reference, here's my project (it's not complete)

In general, you should always link via the compiler driver (link forms of the gcc command), not via direct invocation of ld. If you're developing for bare metal on a particular exact target, it's possible to determine the set of linker arguments you need and use ld directly, but there's a lot that the compiler driver takes care of for you, and it's usually better to let it. (If you don't have a single fixed target, there are unlimited combinations of possibilities and no way you can reproduce all present and future ones someone may care about.)
You can still pass whatever options you like to the linker, e.g. custom linker scripts, via -Wl,... option forms.
As for why the specific target architecture ISA level could matter to linking, linking is not a dumb process of just sticking together binary chunks. Linking can involve patching up (relocations) or even generating (thunks for distant jump targets, etc.) code, in which case the linker may need to care what particular ISA level/variant it's targeting.

Such linker options ensure that the appropriate standard library and start-up code is linked when these are defaulted rather then explicitly specified or overridden.
The one ARM toolchain supports a variety of ARM architecture variants and options; they may be big or little-endian, have various instruction sets - ARM, Thumb. Thumb-2, ARM64 etc, and various extensions such a SIMD or DSP units. The linker requires the architecture information to select the correct library to link for both performance and binary compatibility.

embedded c code and unit tests without cross compile

i am starting to learn unit testing. I use unity and it works well with mingw in eclipse on windows. I use different configurations for debug, release and tests. This works well with the cdt-plugin.
But my goal is to unit test my embedded code for an stm. So i use arm-gcc with the arm-gcc eclipse plugin. I planned to have a configuration for compiling the debug and release code for the target and a configuration using mingw to compile and excute the tests on the pc (just the hardware independet parts).
With the eclipse plugin, i cannot compile code that is not using the arm-gcc.
Is there a way, to have one project with configurations and support for the embedded target and the pc?
Thanks

As noted above you need a makefile pointing at two different targets with different compiler options depending on the target.
You will need to ensure portability in your code.
I have accomplished this most often using CMake and outlining different compiler paths and linker flags for unit tests versus the target. This way I can also easily link in any unit test libraries while keeping them external to my target. In the end CMake produces a Makefile but I'm not spending time worrying about make syntax which while I can read often seems like voodoo.
Doing this entirely within a single Eclipse project is possible. You need to configure your project for multiple targets with different compilers used for each and will require some coaxing to get eclipse to behave.
If you're goal is to do it entirely within Eclipse I suggest reading this as a primer.
If you want to go the other route, here is a CMake primer.

Short answer: Makefile.
But I guess NEON assemblies are a bigger issue.
Using intrinsics instead is at least open to the possibility to link to a simulator library, and there are indeed a lot of such libraries written in standard C that allow code with intrinsics to be portable.
However the poor performance of GCC Neon intrinsics forces a lot of people to sacrifice portability for performance.
If your code unfortunately contains assembly, you won't be able to even compile the code before translating assemblies back to standard C.

How can I compile ANSI C99-based MEX code delivered with Linux makefiles under Win64 MATLAB?

It seems I've got a real problem here due to my lack of any knowledge about Linux systems:
I have downloaded some open source code, which
is written in C
uses complex.h, so I assume it is ANSI C99
comes with makefiles designed for compilation under Linux systems
provides interfaces to IDL, MATLAB, Python etc.
I am indeed familiar about compiling C/MEX files under Windows-based MATLAB environments, but in this case I don't even know where to start. The project is distributed in several folders and consists of dozens of source and header files. And, to begin with, the Visual Studio 2010 compiler I've used to compile MEX files until now does not comply with the C99 standard, i.e. it does not recognize the complex.h header.
Any help towards getting this project compiled would be highly appreciated. In particular, I have the following questions:
1) Is there any possibility to automatically extract compilation information from the MEX files and transfer it to Windows reality?
2) Is there any free compiler being able to compile C99 stuff, which is also easy to embed in MATLAB?

I have done this (moved in-house legacy code inc. mex files to Win64). I can't recommend the experience.
You will have to recompile, no way around it.
Supported compilers for mex depend on your MATLAB version
This File Exchange entry for using Pelles C may be a starting point (if it works with your version of MATLAB).
I am guessing that there is a main makefile which then works through the makefiles in the subdirectories - have a read through the instructions for compiling under Linux, it will give you some idea of what's going on and may also discuss what to do if you want to change compiler. Once you've found a compatible compiler, the next stage is to understand what the makefiles are doing and edit them accordingly (change paths, compiler, compiler flags, etc.)
Then, from memory (it was a while ago), you get to enjoy a magical mystery tour through increasingly obscure compiler errors. Document everything because if you do get it working, you won't be in a mood to do this twice.

MATLAB R2016b on Windows now supports the MinGW compiler. I'm successfully using this to compile code written primarily for Linux/gcc. I installed this from the Add-On menu in MATLAB (search MinGW).
For my case, I'm building with the legacy code tool. The only thing I needed to do differently than normal was to tell the compiler to support c99 via a compiler flag. This does the trick:
legacy_code('compile', def, {'CFLAGS=-std=c99'})
I had trouble getting the flag command just right (I had some extra quotes that apparently broke things), and asked The MathWorks, so credit is due to their support team for this.
If you are using mex, I would expect to do something very similar.
I would guess that the makefiles are irrelevant for your application; you will need to tell the mex or legacy_code function about all of the files necessary to build the whole application or link against pre-built libraries (which it sounds like you don't have).
I hope this helps!

How to cross compile C code for an ia188em chip

I inherited an old project that uses an Innovasic ia188em processor (previously AM188 from AMD). I will likely need to modify the code, and so will need to recompile. Unfortunately, I'm not sure which compiler was used previously (it compiled into a .hex file), and searching through the source code (and in particular the header files) doesn't seem to indicate it either.
I did see one program that could work, but I was wondering if anyone knew of any free programs that might do this. I saw some forums where people said they thought either an old Borland compiler or Bruce's C Compiler may work with 80188 chips (which I assume my chip falls under?), but nothing concrete. I failed to compile with Borland C++ 5 when I tried, though I admit I probably didn't have it set up correctly.
This is for an embedded board (i.e. no OS). I don't program too often, so my compiler knowledge is limited. I mostly just write simple C programs and compile with gcc under linux. Any help is appreciated.
Updated 10/8: I apologize, I was looking at both this code, and the PC side code that talks to the embedded board, and got mixed up. The code for the ia188em (embedded board) is actually C (not C++). Updated title to reflect that. I'm not sure if it makes a huge difference or not.

You'll need a 16 bit "real mode" x86 compiler. If your compiler is a DOS targeted compiler, you will need some means of generating a raw binary rather than than MS-DOS load module (.exe), this may be possible through linker options or may require a non-DOS linker.
Any build scripts or makefiles included with the project code might help you identifier the toolchain used, but the likelihood is that it is no longer available, and you'll need to source "antique software".

When I used to do this sort of thing (1985 -> 1990) I used the intel toolchain, now long obsolete and no longer available from intel. The tools required were
iC-86 - The compiler
link-86 - the linker
loc-86 - the image locater.
There is some information on these tools at a very old site here.
Another method that was used at the time was to process the .exe file produced by a Microsoft standard real mode PC compiler (MS-Pascal was the language used on that project) into an absolutely located image that could be blown into EPROM. The tool used for the conversion was proprietary to the company so I have no idea whether there is an equivalent available

C object file compatibility between computers

First I want to state for the record that this question is related to school/homework.
Let’s say computers CP1 and CP2 both share the same operating system and machine language. If a C program is compiled on CP1, in order to move it to CP2, is it necessary to transfer the source code and recompile on CP2, or simply transfer the object files.
My gut answer is that the object files should suffice. The C code is translated into assembly by the compiler and assembled into machine code by the assembler. Because the architecture shares the same machine code and operating system, I don't see a problem.
But the more I think about it, the more confused I’m starting to get.
My questions are:
a) Since its referring to object files and not executables, I’m assuming there has been no linking. Would there be any problems that surface when linking on CP2?
b) Would it matter if the code used C11 standard on CP1 but the only compiler on CP2 was C99? I'm assuming this is irrelevant once the code has been compiled/assembled.
c) The question doesn't specify shared/dynamic linked libraries. So this would only really work if the program had no dependencies on .dll/.so/ .dylib files, or else these would be required on CP2 as well.
I feel like there are so many gotchas, and considering how vague the question is I now feel that it would be safer to simply recompile.
Halp!

The answer is, it depends. When you compile a C program and move the object files to link on a different computer, it should work. But because of factors such as endianness or name mangling, your program might not work as intended, and even might crash when you try to run it.
C11 is not supported by a C99 compiler, but it does not matter if the source has been compiled and assembled.
As long as the source is compiled with the libraries on one machine, you don't need the libraries to link or run the file(s) on the other computer (static libraries only, dynamic libraries will have to be on the computer you run the application on). This said, you should make the program independent so you don't run into the same problems as before where the program doesn't work as intended or crashes.
You could get a compiler that supports EABI so you don't run into these problems. Compilers that support the EABI create object code that is compatible with code generated by other such compilers, thus allowing developers to link libraries generated with one compiler with object code generated with a different compiler.
I have tried to do this before, but not a whole lot, and not recently. Therefore, my information may not be 100% accurate.

a) I've already heard the term "object files" being used to refer to linked binaries - even though it's kinda inaccurate. So maybe they mean "binaries". I'd say linking on a different machine could be problematic if it has a different compiler -
unless object file formats are standardized, which I'm not sure about.
b) Using different standards or even compilers doesn't matter for binary code - if it's linked statically. If it relies on functions from a dynamic lib, there could be problems. Which answers c) as well: Yes, this will be a problem. The program won't start if it doesn't have all required dynamic libs in the correct version. Depends on linking mode (static vs. dynamic), again.

Q: Let’s say computers CP1 and CP2 both share the same operating system and machine language.
A: Then you can run the same .exe's on both computers
Q: If a C program is compiled on CP1, in order to move it to CP2, is it necessary to transfer the source code
A: No. You only need the source code if you want to recompile. You only need to recompile if it's a different, incompatible CPU and/or OS.
"Object files" are generally not needed at all for program execution:
http://en.wikipedia.org/wiki/Object_files
An object file is a file containing relocatable format machine code
that is usually not directly executable. Object files are produced by
an assembler, compiler, or other language translator, and used as
input to the linker.
An "executable program" might need one or more "shared libraries" (aka .dll's). In which case the same restrictions apply: the shared libraries, if not already resident, must be copied along with the .exe, and must also be compatible with the CPU and OS.
Finally, "scripts" do not need to be recompiled. You may copy the script freely from computer to computer. But each computer must have an "interpreter" to run the script: a Perl script needs a Perl interpreter, a Python script a python interpreter, and so on.