Arm Neon Intrinsics vs hand assembly - arm

https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22
On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now in 2012.
So has the compilation optimization improved for intrinsics using gnu cross compiler?

My experience is that the intrinsics haven't really been worth the trouble. It's too easy for the compiler to inject extra register unload/load steps between your intrinsics. The effort to get it to stop doing that is more complicated than just writing the stuff in raw NEON. I've seen this kind of stuff in pretty recent compilers (including clang 3.1).
At this level, I find you really need to control exactly what's happening. You can have all kinds of stalls if you do things in just barely the wrong order. Doing it in intrinsics feels like surgery with welder's gloves on. If the code is so performance critical that I need intrinsics at all, then intrinsics aren't good enough. Maybe others have difference experiences here.

I've had to use NEON intrinsics in several projects for portability. The truth is that GCC doesn't generate good code from NEON intrinsics. This is not a weakness of using intrinsics, but of the GCC tools. The ARM compiler from Microsoft produces great code from NEON intrinsics and there is no need to use assembly language in that case. Portability and practicality will dictate which you should use. If you can handle writing assembly language then write asm. For my personal projects I prefer to write time-critical code in ASM so that I don't have to worry about a buggy/inferior compiler messing up my code.
Update: The Apple LLVM compiler falls in between GCC (worst) and Microsoft (best). It doesn't do great with instruction interleaving nor optimal register usage, but at least it generates reasonable code (unlike GCC in some situations).
Update2: The Apple LLVM compiler for ARMv8 has been improved dramatically. It now does a great job generating ARMv8 code from C and intrinsics.

So this question is four years old, now, and still shows up in search results...
In 2016 things are much better.
A lot of simple code that I've transcribed from assembly to intrinsics is now optimised better by the compilers than by me because I'm too lazy to do the pipeline work (for how many different pipelines now?), while the compilers just needs me to pass the right --mtune=.
For complex code where register allocation can get tight, GCC and Clang both can still produce slower than handwritten code by a factor of two... or three(ish). It's mostly on register spills, so you should know from the structure of your code whether that's a risk.
But they both sometimes have disappointing accidents. I'd say that right now that's worth the risk (although I'm paid to take risk), and if you do get hit by something then file a bug. That way things will keep on getting better.

By now you even get auto-vectorization for the plain C code and the intrinsics are handled properly:
https://godbolt.org/z/AGHupq

Related

Benchmarks for C compiler optimization

What are the standard benchmarks for comparing C the optimizer of various C compilers?
I'm particularly interested in benchmarks for ARM (or those that can be ported to ARM).
https://en.wikipedia.org/wiki/SPECint is mostly written in C, and is the industry standard benchmark for real hardware, computer-architecture theoretical research (e.g. a larger ROB or some cache difference in a simulated CPU), and for compiler developers to test proposed patches that change code-gen.
The C parts of SPECfp (https://en.wikipedia.org/wiki/SPECfp) are also good choices. Or for a compiler back-end optimizer, the choice of front-end language isn't very significant. The Fortran programs are fine too.
Related: Tricks of a Spec master is a paper that covers the different benchmarks. Maybe originally from a conference.
In this lightning round talk, I will
cover at a high level the performance characteristics of
these benchmarks in terms of optimizations that GCC
does. For example, some benchmarks are classic floating point applications and benefit from SIMD (single instruction multiple data) instructions, while other benchmarks don’t.
Wikipedia is out of date. SPECint/fp 2017 was a long time coming, but it was released in 2017 and is a significant improvement over 2006. e.g. some benchmarks trivialized by clever compiler optimizations like loop inversion. (Some compilers over the years have added basically pattern-recognition to optimize the loop in libquantum, but they can't always do that in general for other loops even when it would be safe. Apparently it can also be easily auto-parallelized.)
For testing a compiler, you might actually want code that aggressive optimization can find major simplifications in, so SPECcpu 2006 is a good choice. Just be aware of the issues with libquantum.
https://www.anandtech.com/show/10353/investigating-cavium-thunderx-48-arm-cores/12 describes gcc as a compiler that "does not try to "break" benchmarks (libquantum...)". But compilers like ICC and SunCC that CPU vendors use / used for SPEC submissions for their own hardware (Intel x86 and Sun UltraSPARC and later x86) are as aggressive as possible on SPEC benchmarks.
SPEC result submissions are required to include compiler version and options used (and OS tuning options), so you can hopefully replicate them.

ARM Cortex toolchain speed optimization

There is an abundance of IDEs and toolchains for the Arm Cortex architecture for C/C++.
Lately, faced with a hard speed optimization issue on my STM Cortex-M3, I started wondering if there are indeed performance differences (in terms of output code execution speed), between different vendors, as some of them claim.
More specifically, between the GNU-C copiler and commercial ones.
Did someone do a comparison between different compilers in this sense?
Practical Speaking Binaries Generated from Commercial IDEs Are more Optimized and Smaller in code size than Ones Generated by GCC , The difference is there but not so big and May even get close to nothing with a little bit of effort With some optimization, I Personally don't think that you will find Any clear benchmark for Commercial Toolchains vs GCC based Ones, Speed and Size Really do depend on so many factors.

Why compilers are written in C/C++ instead of using CoffeeScript (JavaScript, Node JS)?

I am exposed to C because of embedded system programming, and I think it's one wonderful language in this field. However, why is it used to write compilers? If the reason why gcc is implemented in C/C++ is that there aren't many good languages at that time, there's no excuse for why clang is taking the same path (using C/C++).
Is it for performance reasons? Mostly interpreted languages are a bit slower compared with compiled languages, but I guess the difference is almost negligible in CoffeeScript (JavaScript), because of Node.js.
From the perspective of developers, I suppose it's much easier to write one compiler using high level languages. Unfortunately, most of compilers out there are written in C/C++. Is it just because of legacy code?
Response to comments:
Bootstrapping is just one way to illustrate that this language is powerful enough to write one compiler. It shouldn't the dominant reason why we choose the language to implement the compiler.
I agree with the guess given below, that "most compiler developers would answer because most of compiler related tools (bison, yacc) emit C code". However, neither GCC nor Clang use generated parser, they implemented one themselves. This front-end process is independent of targeting architecture, and should not be C/C++'s strength.
There's more or less consensus that performance is one key factor. Indeed, even for GCC and Clang, building a reasonable size of C project (Linux kernel) takes a lot of time. Is it because of the front-end or the back-end. I have to admit that I didn't have much experience on backe-end of compilers, as we finished the course on compiler with generated LLVM code.
I am exposed to C because of embedded system programming, and I think
it's one wonderful language in this field.
Yes. It's better than Java.
However, why is it used to write compilers?
This question can't be answered without asking the developers. I suspect that the majority of them will tell you that common compiler-writing software (yacc, flex, bison, etc) produce C code.
If the reason for gcc is that there aren't many good languages,
there's no excuse for clang.
GCC isn't a programming language, and neither is Clang. They're both implementations of the C programming language.
Is it for performance reasons?
Don't confuse implementation with specification. Speed is an attribute introduced by your compiler and your computer, not by the programming language. GCC happens to produce fairly efficient machine code, which might influence developers to use C as their primary programming language... but in ten years time, it could* be that node.js produces more efficient machine code than GCC. Don't forget, StackOverflow is forever.
* could, but most likely won't. See Ira Baxters comment below for more info.
Mostly interpreted languages are a bit slower compared with compiled
languages, but I guess the difference is almost negligible in
CoffeeScript (JavaScript), because of Node.js.
Similarly, interpretation or compilation isn't the choice of the language, but of the implementation of the language. For example, GCC and Clang choose to compile C to machine code. Ch and CINT are two interpreters that translate C code directly to behaviour, rather than machine code. Java was once predominantly translated using interpretation, too, but is now predominantly compiled into JVM bytecode. Javascript seems to be phasing towards predominant compilation, too. Who knows? Maybe you'll see compilers written predominantly in Javascript in ten years time...
From the perspective of developers, I suppose it's much easier to
write one compiler using high level languages.
All of these programming languages are technically high level. They're mostly defined in terms of an abstract machine; They're certainly not low level.
Unfortunately, most of compilers out there are written in C/C++.
I don't consider it unfortunate that C++ is used to write software; It's not a bad programming language.
Is it just because of legacy code?
I suppose legacy code might influence the decision of a programmer. In the end though, as I said, you'd have to ask the developers. They might just decide to use C or C++ because C or C++ is their favourite programming language... Why do you speak English?
Compilers are very complex software in general. The front end part is pretty simple (parsing), but the backend part (scheduling, code generation, optimizations, register allocations) involve NP-complete problems (of course compilers try to approximate solutions to these problems). Thus, implementing in C would help compile times. C is also very good at bitwise operators and other low level stuff, which is useful for writing a compiler.
Note that not all compilers are written in C though. For example, Haskell GHC compiler is written in Haskell using bootstrapping technique.
Javascript is async, which doesn't suit compiler writing.
I see many reasons:
There is no elegant way of handling bit-precise code in Javascript
You can't write binary files easily in Javascript, so the assembler part of the compiler would have to be in a more low-level language
Huge JS codebase are very heavy to load in memory (that's plain text, remember?)
Writing optimizing routines for compilers are heavily CPU-intensive, which is not yet very compatible with Javascript
You wouldn't be able to compile your compiler with it (bootstrap), because you need a Javascript interpreter behing your compiler. The bootstrap phase wouldn't be "pure":
JS Compiler compiles NodeJS -> NodeJS runs your new Compiler -> new JS Compiler
gcc is implemented primarily in C, but that is not true of all compilers, including some that are quite standard. It is a common pattern for a compiler to be implemented in the language that it compiles. ghc is written largely in Haskell. Recent versions of guile feature a compiler implemented mostly in Scheme.
nope, coffeescript et al are still much slower than natively-compiled (and optimised) C code. Even if you take the subset of javscript that is able to be optimised (asm.js) its still twice as slow as native C.
What you hear about when people say node.js is just as fast as C code means that its just as fast as part of an overall system that does other things like read from disk, wait for data off the network, etc. In these systems the CPU is underused (especially with today's superfast CPUs) so the performance problem is not the raw processing capability of the language. Hence, a node.js server is exactly as fast as a C server if they're both stuck waiting for a network call to return data. The type of system written in node.js does a lot of waiting for network which is why people use node.js. The type of system written in C does not suit being written in node.js

Which free C compiler gives options for greater optimizations?

Can you please give me some comparison between C compilers especially with respect to optimization?
Actually there aren't many free compilers around. gcc is "the" free compiler and probably one of the best when it comes to optimisation, even when compared to proprietary compilers.
Some independent benchmarks are linked from here:
http://gcc.gnu.org/benchmarks/
I believe Intel allows you to use its ICC compilers under Linux for non-commercial development for free. ICC beats gcc and Visual Studio hands down when it comes to code generation for x86 and x86-64 (i.e. it typically generates faster code, and can do a decent job of auto-vectorization (SIMD) in some cases).
This is a hard question to answer since you did not tell us what platform you are using, neither hardware or os....
But joemoe is right, gcc tend to excel in this field.
(As a side note: On some platforms there are commercial compilers that are better, but since you gain so much more that just the compiler gcc is hard to beat...)
the Windows SDK is a free download. it includes current versions of the Visual C++ compilers. These compilers do a very good job of optimisation.

C compiler producing lightweight executeables

I'm currently using MSVC for C++ but as I'm switching to C to write a very performance-intensive program (interpreter) I have to search for a fitting C compiler.
I've looked at some binaries produced by Turbo-C and even if its old they seem pretty straigthforward and optimized.
Now I don't know what the best compiler for building an interpreter is, but maybe you can help me.
I've considered GCC but as I don't know much about it, I can't be really sure.
99.9% a programs performance depends on the code you write and the language you choose.
you can safely ignore the performance of the the compiler.
Stick to MSVC...and dont waste time :)
If I were you, I would take the approach of worrying less about the compiler and worrying more about your own code. Write the code for the interpreter in a reasonable way. Then, profile it, and optimize spots based on how much time they take. That is more likely to produce a performance benefit than using a particular compiler.
If you want a lightweight program, it is not the compiler you need to worry about so much as the code you write and the libraries you use. Most compilers will produce similar results from the same source code.
For example, using C++ with MFC, a basic windows application used to start off at about 900kB and grow rapidly. Linking with the dynamic MFC dlls would get you down to a few hundred kB. But by dropping MFC totally - using Win32 APIs directly - and using a minimal C runtime it was relatively easy to implement the same thing in an .exe of about 25kB or less (IIRC - it's been a long time since I did this).
So ditch the libraries and get back to proper low level C (or even C++ if you don't use too many "clever" features), and you can easily write very compact applications.
edit
I've just realised I was confused by the question title into talking about lightweight applications as opposed to concentrating on performance, which appears to be the real thrust of the question. If you want performance, then there is no specific need to use C, or move to a painful development environment - just write good, high performance code. Fundamentally this is about using the correct designs and algorithms and then profiling and optimising the resulting code to eliminate bottlenecks and inefficiencies. Note that these days you may achieve a much bugger bang for your buck by switching to a multithreaded approach than just concentrating on raw code optimisation - make sure you utilise the hardware well.
You can use GCC, through MingW, Eclipse CDT, or one of the other Windows ports. You can optimize for executable size, speed of resulting executable, or speed of compilation.
C++ was designed to be backward compatible with C. So any C++ compiler should be able to compile pure C. You might want to tell it that it's C and not C++, so compiler doesn't do name mangling, etc. If the compiler is very good with C++, it should be equally good, or better with C, because C is much simpler.
I would suggest sticking with MSVC. It's a very decent system. Though if you are not convinced - compare your options. Build the same program with multiple compilers, look at the assembly they produce, measure the resulting executable performance, etc.

Resources