Link-time optimization (LTO) (a.k.a. unity build) is included in GCC 4.5 or later and other compilers have similar optimization passes. Doesn't this make certain code patterns much more viable than before?
For example, for maximum performance a "module" of C code often needs to expose its guts. Does LTO make this obsolete? What code patterns are now viable that were not before?
I believe that LTO is simply an optimization, but not necessarily one that obviates the need for documentation of implemenation ("exposing the guts") of any module. Whole languages have been written to that effect; I do not think C will have that need removed from it soon, or perhaps ever.
From the description of the LTO feature in gcc:
Link Time Optimization (LTO) gives GCC the capability of dumping its
internal representation (GIMPLE) to disk, so that all the different
compilation units that make up a single executable can be optimized as
a single module. This expands the scope of inter-procedural
optimizations to encompass the whole program (or, rather, everything
that is visible at link time).
From the announcement of LTO's inclusion into gcc:
The result should, in principle, execute faster but our IPA cost
models are still not tweaked for LTO. We've seen speedups as well as
slowdowns in benchmarks (see the LTO testers at
http://gcc.opensuse.org/).
Related
According to most benchmarks, Intel's Clear Linux is way faster than other distributions, mostly thanks to a GCC feature called Function Multi-Versioning. Right now the method they use is to compile the code, analyze which function contains vectorized loops, then patch the code with FMV attributes and compile it again.
How feasible will it be for GCC to do it automatically? For example, by passing -mmultiarch=sandybridge,skylake (or a similar -m option listing CPU extensions like AVX and AVX2).
Right now I'm interested in two usage scenarios:
Use this option for our large math-heavy program for delivering releases to our customers. I don't want to pollute the code with non-standard attributes and I don't want to modify the third-party libraries we use.
The other Linux distributions will be able to do this easily, without patching the code as Intel does. This should give all Linux users massive performance gains.
No, but it doesn't matter. There's very, very little code that will actually benefit from this; for the most part by doing it globally you'll just (without special effort to sort matching versions in pages together) make your system much more memory-constrained and slower due to the huge increase in code size. Most actual loads aren't even CPU-bound; they're syscall-overhead-bound, GPU-bound, IO-bound, etc. And many of the modern ones that are CPU-bound aren't running precompiled code but JIT'd code (i.e. everything running in a browser, whether that's your real browser or the outdated and unpatched fork of Chrome in every Electron app).
Let's say someone wants to statically compile a given language using LLVM, what would be the biggest differences (advantages and disadvantages) to translate it first to C and then use CLang instead of dealing with a direct IR translation.
The obvious answer I guess would be that by using a front-end that knows the source language, it is easier to come up with an optimized IR represention with than expecting CLang to perform well with the generated C.
I am missing something here ?
Advantages of using a generic C backend:
You can use any C compiler (not just Clang)
Easier to debug an intermediate code if it's in such a high level language
Depending on your source language semantics, it might be easier to translate it via C (but not necessarily)
And disadvantages are:
If your language is compiled incrementally (e.g., no clearly separated modules, or complex macro system, or whatever else), compiling via LLVM IR in a single module with immediate JIT-compilation makes more sense than generating hundreds of tiny C modules. In other words, C is enforcing separate compilation.
If your source language semantics is too far from C, compiling it straight into a lower level can be easier.
Not all the LLVM functionality is directly accessible from C. E.g., intrinsics, alternative calling conventions, debug metadata for a higher level language.
Clang is big, excluding it will improve your memory footprint
Clang is not easy to maintain, it depends on presence and exact locations of the headers, depends on some parts of gcc, etc. Without it, bare LLVM can be used on its own and dependencies may be kept self-contained.
Optimisations in most cases are not an issue. Clang is generating an extremely non-optimal LLVM IR, deliberately. LLVM should care for all the optimisations, not the frontends. Unless, of course, you can do some high level optimisations, but then they won't depend on your backend choice.
By pure chance I stumbled over an article mentioning you can "enable" ASLR with -pie -fPIE (or, rather, make your application ASLR-aware). -fstack-protector is also commonly recommended (though I rarely see explanations how and against which kinds of attacks it protects).
Is there a list of useful options and explanations how they increase the security?
...
And how useful are such measures anyway, when your application uses about 30 libraries that use none of those? ;)
Hardened Gentoo uses these flags:
CFLAGS="-fPIE -fstack-protector-all -D_FORTIFY_SOURCE=2"
LDFLAGS="-Wl,-z,now -Wl,-z,relro"
I saw about 5-10% performance drop in comparison to optimized Gentoo linux (incl. PaX/SElinux and other measures, not just CFLAGS) in default phoronix benchmark suite.
As for your final question:
And how useful are such measures anyway, when your application uses about 30 libraries that use none of those? ;)
PIE is only necessary for the main program to be able to be loaded at a random address. ASLR always works for shared libraries, so the benefit of PIE is the same whether you're using one shared library or 100.
Stack protector will only benefit the code that's compiled with stack protector, so using it just in your main program will not help if your libraries are full of vulnerabilities.
In any case, I would encourage you not to consider these options part of your application, but instead part of the whole system integration. If you're using 30+ libraries (probably most of which are junk when it comes to code quality and security) in a program that will be interfacing with untrusted, potentially-malicious data, it would be a good idea to build your whole system with stack protector and other security hardening options.
Do keep in mind, however, that the highest levels of _FORTIFY_SOURCE and perhaps some other new security options break valid things that legitimate, correct programs may need to do, and thus you may need to analyze whether it's safe to use them. One known-dangerous thing that one of the options does (I forget which) is making it so the %n specifier to printf does not work, at least in certain cases. If an application is using %n to get an offset into a generated string and needs to use that offset to later write in it, and the value isn't filled in, that's a potential vulnerability in itself...
The Hardening page on the Debian wiki explains at least the most commons ones which are usable on Linux. Missing from your list is at least -D_FORTIFY_SOURCE=2, -Wformat, -Wformat-security, and for the dynamic loader the relro and now features.
I'm currently using MSVC for C++ but as I'm switching to C to write a very performance-intensive program (interpreter) I have to search for a fitting C compiler.
I've looked at some binaries produced by Turbo-C and even if its old they seem pretty straigthforward and optimized.
Now I don't know what the best compiler for building an interpreter is, but maybe you can help me.
I've considered GCC but as I don't know much about it, I can't be really sure.
99.9% a programs performance depends on the code you write and the language you choose.
you can safely ignore the performance of the the compiler.
Stick to MSVC...and dont waste time :)
If I were you, I would take the approach of worrying less about the compiler and worrying more about your own code. Write the code for the interpreter in a reasonable way. Then, profile it, and optimize spots based on how much time they take. That is more likely to produce a performance benefit than using a particular compiler.
If you want a lightweight program, it is not the compiler you need to worry about so much as the code you write and the libraries you use. Most compilers will produce similar results from the same source code.
For example, using C++ with MFC, a basic windows application used to start off at about 900kB and grow rapidly. Linking with the dynamic MFC dlls would get you down to a few hundred kB. But by dropping MFC totally - using Win32 APIs directly - and using a minimal C runtime it was relatively easy to implement the same thing in an .exe of about 25kB or less (IIRC - it's been a long time since I did this).
So ditch the libraries and get back to proper low level C (or even C++ if you don't use too many "clever" features), and you can easily write very compact applications.
edit
I've just realised I was confused by the question title into talking about lightweight applications as opposed to concentrating on performance, which appears to be the real thrust of the question. If you want performance, then there is no specific need to use C, or move to a painful development environment - just write good, high performance code. Fundamentally this is about using the correct designs and algorithms and then profiling and optimising the resulting code to eliminate bottlenecks and inefficiencies. Note that these days you may achieve a much bugger bang for your buck by switching to a multithreaded approach than just concentrating on raw code optimisation - make sure you utilise the hardware well.
You can use GCC, through MingW, Eclipse CDT, or one of the other Windows ports. You can optimize for executable size, speed of resulting executable, or speed of compilation.
C++ was designed to be backward compatible with C. So any C++ compiler should be able to compile pure C. You might want to tell it that it's C and not C++, so compiler doesn't do name mangling, etc. If the compiler is very good with C++, it should be equally good, or better with C, because C is much simpler.
I would suggest sticking with MSVC. It's a very decent system. Though if you are not convinced - compare your options. Build the same program with multiple compilers, look at the assembly they produce, measure the resulting executable performance, etc.
I'm using the standard gcc compiler in math software development with C-language. I don't know that much about compilers or compiler options, and I was just wondering, is it possible to make faster executables using another compiler or choosing better options? The default Makefile sets options -ffast-math and -O3 and I think both of them have some impact in the overall calculation time. My software is using memory quite extensively, so I imagine some options related to memory management might do the trick?
Any ideas?
Before experimenting with different compilers or random, arbitrary micro-optimisations, you really need to get a decent profiler and profile your code to find out exactly what the performance bottlenecks are. The actual picture may be very different from what you imagine it to be. Once you have a profile you can then start to consider what might be useful optimisations. E.g. changing compiler won't help you if you are limited by memory bandwidth.
Here are some tips about gcc performance:
do benchmarks with -Os, -O2 and -O3. Sometimes -O2 will be faster because it makes shorter code. Since you said that you use a lot of memory, try with -Os too and take measurements.
Also check out the -march=native option (it is considered safe to use, if you are making executable for computers with similar processors) on the client computer. Sometimes it can have considerable impact on performance. If you need to make a list of options gcc uses with native, here's how to do it:
Make a small C program called test.c, then
$ touch test.c
$ gcc -march=native -fverbose-asm -S test.c
$ cat test.s
credits for code goto Gentoo forums users.
It should print out a list of all optimizations gcc used. Please note that if you're using i7, gcc 4.5 will detect it as Atom, so you'll need to set -march and -mtune manually.
Also read this document, it will help you (still, in my experience on Gentoo, -march=native works better) http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
You could try with new options in late 4.4 and early 4.5 versions such as -flto and -fwhole-program. These should help with performance, but when experimenting with them, my system was unstable. In any case, read this document too, it will help you understand some of GCC's optimization options http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
If you are running Linux on x86 then typically the Intel or PGI compilers will give you significantly faster performing executables.
The downsides are that there are more knobs to tune and that they come with a hefty price tag!
If you have specific hardware you can target your code for, the (hardware) company often releases paid-for compilers optimized for that hardware.
For example:
xlc for AIX
CC for Solaris
These compilers will generally produce better code optimization-wise.
As you say your program is memory heavy you could test to use a different malloc implementation than the one in standard library on your platform.
For example you could try the jemalloc (http://www.canonware.com/jemalloc/).
Keep in mind they most improvements to be had by changing compilers or settings will only get you proportional speedups where as adjusting algorithms you can sometimes get improvements in the O() of your program. Be sure to exhaust that before you put to much work into tweaking settings.