Is C inefficient compared to Assembly? [duplicate] - c

This question already has answers here:
When is assembly faster than C? [closed]
(40 answers)
Closed 1 year ago.
This is purely a theory question, so, given an "infinite" time to make a trivial program, and an advanced knowledge of C and Assembly, is it really better to do something in Assembly? is "performance" lost when compiling C into Assembly (to machine code)?
By performance I mean, do modern C compilers do a bad job at certain tasks that programming directly in Assembly speeds up?

Modern C can do a better job than assembly in many cases, because keeping track of which operations can overlap and which will block others is so complex it can only be reasonably be tracked by a computer.

C is not inefficient compared to anything. C is a language, and we don't describe languages in terms of efficiency. We compare programs in terms of efficiency. C doesn't write programs; programmers write programs.
Assembly gives you immense flexibility when comparing with C, and that is at the cost of time programming. If you are a guru C programmer and a guru Assembly programmer, then chances are you might be able to squeeze some more juice with Assembly for writing any given program, but the price for that is virtually certain to be prohibitive.
Most of us aren't gurus in either of these languages. For most of us, giving the responsibility of performance tuning to a C compiler is a double win: you get the wisdom of a number of Assembly gurus, the people who wrote the C compiler, along with an immense amount of time in your hands to further correct and enhance your C program. You also get portability as a bonus.

This question seems to stem from the misconception that higher performance is automatically better. There is too much to be gained from a higher level perspective to make assembly better in the general case. Even if performance is your primary concern, compilers usually do a better job creating efficient assembly than you could write yourself. They have a much broader "understanding" of all of your source code than you could possibly hold in your head. Many optimizations can be had from NOT using well-structured assembly.
Obviously there are exceptions. If you need to access hardware directly, including special processing features of CPUs (e.g. SSE), then assembly is the way to go. However, in that case, you're probably better off using a library that addresses your general problem more directly (e.g. numerics packages).
But you should only worry about things like this if you have a concrete, specific need for the increased performance and you can show that your assembly actually IS faster. Concrete specific needs include: noticed and measure performance problems, embedded systems where performance is a fundamental design concern, etc.

Unless you are an assembly expert and(/or) taking advantage of advanced opcodes not utilized by the compiler, the C compiler will likely win.
Try it for fun ;-)
More realistic solutions are often to let the C compiler do it's bit, then profile and, if needed, tweak specific sections -- many compilers can dump some sort of low-level IL (or even "assembly").

Use C for most tasks, and write inline assembly code for specific ones (for example, to take advantage of SSE, MME, ...)

It depends. C compilers for Intel do a pretty good job nowadays. I wasn't so impressed by compilers for ARM - I could easly write an assembly version of an inner loop that performed twice as fast.
You typically don't need assembly on x86 machines. If you want to gain direct access to SSE instructions, look into compiler intrinsics!

Ignoring how much time it would take to write the code, and assuming you have all the knowledge that is required to do any task most efficiently in both situations, assembly code will, by definition, always be able to either meet or outperform the code generated by a C compiler, because the C compiler has to create the assembly code to do the same task and it cannot optimize everything; and anything the C compiler writes, you could also write (in theory), and unlike the compiler, you can sometimes take a shortcut because you know more about the situation than can be expressed in C code.
However, that doesn't mean they do a bad job and that the code is too slow; just that it's slower than it could be. It may not be by more than a few microseconds, but it can still be slower.
What you have to remember is that some optimizations performed by a compiler are very complex: agressive optimization tends to lead to very unreadable assembly code, and it becomes harder to reason about the code as a result if you were to do them manually. That's why you'd normally write it in C (or some other language) first, then profile it to find problem areas, and then go on to hand-optimize that piece of code until it reaches an acceptable speed - because the cost of writing everything in assembly is much higher, while often providing little or no benefit.

Actually, C might be faster than assembly in many cases, since compilers apply optimizations to your code. Even so, the performance difference (if any) is negligible.
I would focus more on readability & maintainability of the code base, as well as whether what you are trying to do is supported in C. In many cases, assembly will allow you to do more low-level things that C simply cannot do. For example, with assembly you can take advantage of MMX or SSE instructions directly.
So in the end, focus on what you want to accomplish. Remember - assembly language code is terrible to maintain. Use it only when you have no other choice.

No, compilers do not do a bad job at all. The amount of optimization that can be squeezed out by using assembly is insignificant for most programs.
That amount depends on how you define 'modern C compiler'. A brand new compiler (For a chip that has just reached market) may have a large number of inefficiencies that will get ironed out over time. Just compile some simple programs (the string.h functions, for example), and analyze what each line of code does. You may be surprised at some of the wasteful things an untested C compiler does, and recognize the error with a simple read-through of the code. A mature, well-tested, thoroughly optimized compiler (Think x86) will do a great job of generating assembly, though a new one will still do a decent job.
In no case can C do a better job than assembly. You could just benchmark the two, and if your assembly was slower, compile with -S and submit the resulting assembly, and you're guaranteed a tie. C is compiled to assembly, which has a 1:1 correlation with the bytecode. The computer can't do anything that assembly can't do, assuming that the complete instruction set is published.
In some cases, C is not expressive enough to be fully optimized. A programmer may know something about the nature of the data that simply cannot be expressed in C in such a way that the compiler can take advantage of this knowledge. Certainly, C is expressive and close to the metal, and is very good for optimization, but complete optimization is not always possible.
A compiler can't define 'performance' like a human can. I understand that you said trivial programs, but even in the simplest (useful) algorithms, there will be a tradeoff between size and speed. The compiler can't do this at a more fine grained scale than the -Os/-O[1-3] flags, but a human can know what 'best' means in the context of the purpose of a program.
Some architecture-dependent assembly instructions can't be expressed in C. This is where ASM() statements come in. Sometimes, these are not for optimization at all, but simply because there is no way to express in C that this line must use, say, the atomic test-and-set operation, or that we want to issue an SVC interrupt with the encoded parameter X.
The above points notwithstanding, C is orders of magnitude more efficient to program in and to master. If performance is important, analysis of the assembly will be necessary, and optimizations will probably be found, but the tradeoff in developer time and effort is rarely worth the effort for complex programs on a PC. For very simple programs which must be as fast as absolutely possible (like an RTOS), or which have severe memory constraints (like an ATTiny with 1KB of Flash (non-writable) memory and 64Bytes of RAM), assembly may be the only way to go.

Given an infinite time and an extremely deep understanding on how a modern CPU works you can actually write the "perfect" program (i.e. the best performance possible on that machine), but you will have to consider, for any instruction in your program, how CPU behaves in that context, pipelining and caching related optimizations, and many many other things.
A compiler is built to generate the best assembly code possible. You will rarely understand a modern complier generated assembly code because it tends to be really extreme.
At times compliers fail in this task because they can't always foresee what's happening.
Generally they do a great job but they sometimes fail...
Resuming... knowing C and Assembly is absolutely not enough to do a better job than a compiler in 99.99% cases, and considered that programming something in C can be 10000 times faster than programming the same assembly program a nicer way to spend some time is optimizing what the compiler did wrong in the remaining 0.01%, not reinventing the wheel.

This depends on the compiler you use? This is no property of C or any language. Theoretically it's possible to load a compiler with such a sophisticated AI that you can compile prolog to more efficient machine language than GCC can do with C.
This depends 100% on the compiler and 0% on C.
What does matter is that C is written as a language for which it is easy to write an optimizing compiler from C -> assembly, and with assembly this means the instructions of a Von Neumann machine. It depends on the target, some languages like prolog will probably be easier to map on hypothetical 'reduction machines'.
But, given that assembly is your target language for your C compiler (you can technically compile C to brainfuck or to Haskell, there is no theoretical difference) then:
It is possible to write the optimally fast program in that assembly itself (duh)
It is possible to write a C compiler which in every instant shall produce the most optimal assembly. That is to say, there exists a function from every C program to the most optimal way to get the same I/O in assembly, and this function is computable, albeit perhaps not deterministically.
This is also possible with every other programming language in the world.

Related

Role of pointers in C and Fortran in determining program speed

My understanding is that Fortran (pre-90) is extremely fast in part because it does not allow pointer aliasing (and therefore allows better compiler optimization). However, I also know that pointers in C-family languages allow programmers to write extremely fast code.
I don't understand why the two languages are fast for opposite reasons. Can anyone shed some light on what's going on?
Thanks in advance.
Discussing about languages speed, and specifically optimization efficiency, is really misleading.
Let's start saying that each language has been created to simplify some specific aspects of programming scenario, and for this very reason the first goal of a programmer is the right language choice in function of the application being written.
This is the very first point, after that consider that nowadays the compilers programming use standardized tools and well defined flows, which starts from the conversion of the source to an intermediate representation on which will act the rest of the compilation chain included the optimizer. The latter happen to be the same for almost all languages, so it is realistic to expect the same result.
Practical examples can be seen taking a look to the mos diffused compilers families as GCC https://gcc.gnu.org/, LLVM https://llvm.org/, or even the .net https://en.wikipedia.org/wiki/.NET_Framework.
So the starting point, that can make a difference, is how the language is translated in the intermediary form, allowing for a better presentation to the optimization stage of the compiler chain. This depends not only from the quality of the translation phase but also from the abstraction level of the language.
We normally would think that lower the abstraction of the language, i.e. machine assembler, the better the optimization. That's absolutely wrong! Unless you are an excellent assembler programmer nothing can be done to a very bad assembler code writing.
On the contrary, precisely the high abstraction level consent to the compiler to translate the code in the more efficient way and present it to the optimizer in the most workable form.
Fortran and C are on very different levels of abstraction, the first tight enough to imply standard, and for this reason well known and pre-optimized code, the second a wide spread language, that can touch very low levels, using pointers and even inline assembler, or high level when used without abuse of any side effect.
Anyway the last C99-C11 standards have introduced many more language qualifiers that permits alignment also on some well known deficiencies like opaque pointers (https://en.wikipedia.org/wiki/Opaque_pointer) using the restrict qualifier. And in current compilers also the vectorization, usage of streaming instruction available on modern CPUs (SIMD, SSE2, etc), is vastly diffused. I.e. for the X86-64 platform the Intel C/C++ compiler is the most efficient compiler/optimizator.
Then what we have to expect for the future? With compiler's technology advancing we should expect an asymptotic nulling of any difference.
For further reading you can found also an excellent answer on Computer science stack exchange: https://scicomp.stackexchange.com/questions/203/what-makes-fortran-fast

Which is faster *in practice*: decent C code, or decent hand-written assembler? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
A long, long time ago, in a galaxy far away, I used to write programs in Delphi, and then if I needed something to happen really fast, I'd write those routines in hand-written assembler. It produced much faster code than the compiler did.
But is this true in practice any more? Obviously hand-written assembler will always be at least as fast in principle as compiled high-level code. But CPUs have moved on a long way since those dark times. Now, if you were trying to optimize your assembler, you'd have to take into account ordering of instructions so they could be pipelined or run concurrently, the effect of branch prediction, and a million other things; and I suspect it's impossible to hold them all in human RAM simultaneously.
So does this mean that a decent (but not superhuman) programmer will these days produce faster code by writing C than by writing hand-written assembler, at least when coding for a modern CPU?
One other possibility that occurs to me. Does the optimization happen before the high-level language is turned into assembler, or afterwards? If it's afterwards... might it be faster to produce hand-written assembler, and then put it through the compiler's optimization process?
The question arose recently when I was writing some code for a programming challenge, where the essence of it was to produce a routine that should run as fast as possible on a Raspberry Pi. I would have been allowed to write it in assembler; but my guess was that carefully written C would be faster, even though a Pi processor isn't that sophisticated in 2014 terms.
To make the question more concrete and specific:
Suppose you're wanting to write blazingly fast (integer) number-crunching code to run on a Raspberry Pi. You've written some very nice C code that runs as a tight loop to solve the problem. Is it worth hand-crafting it in assembler to speed it up, or will that in practice give you something less efficient?
To me it looks both answers given so far are correct. The answer depends, among other things, on the particular CPU architecture we're talking about. The more complex architecture is, the harder it is to write efficient ASM code by hand.
On one end of the spectrum are CISC cores such as x86. They have multiple execution units, long pipelines, variable instruction latencies per instruction, etc. In many cases ASM code that looks "clean" or "optimal" to a human isn't in fact optimal for a CPU and can be improved by using instructions or techniques from dark corners of processor manuals. Compilers "know" about this and can produce decently optimised code. True, in many cases emitted code can be improved by a skilful human, but with the right compiler and optimisations settings code is often already very good. Also, with C code at hand you won't need to re-optimise it manually for every new CPU generation (yes, optimisations often depend on particular CPU family, not only on instruction set), so writing in C is a way of "future-proofing" your code.
On another end of the spectrum are simple RISC cores, such as 8051 (or others simple 8-bit controllers). They have much simpler scheduling semantics and smaller instruction sets. Compilers still do a decent job of optimisation here, but it's also much simpler to write a decent ASM code by hand (or fix performance issues in emitted code).
Hand-written assembler is still faster than decent C code. If you knew how to write assembler you wouldn't believe what cruft some compilers generate. I have seen insane stuff like loading a value from memory and instantly writing it back unmodified (as recent as two years ago, I generally do not look at assembler output anymore). Here is an even more recent rant by Torvalds on a similar issue in gcc lkml.org.
However, even though hand-written assembler is still faster, it generally does not pay off. At the maximum, you'll want to write some very performance critical short routines in assembler. The rest is better left in C for portability.
In practice decent C code compiled with an optimizing compiler is faster than assembler code, in particular once you need more than a few dozen of source code lines.
Of course you need a good, recent, optimizing compiler. Cross-compiling with a recent GCC tuned for your particular hardware (and software) system is welcome. So use options like -O2 -mtune=native (at least on x86)
The point is that recent processors need, even for a "simple" instruction set, sophisticated instruction scheduling and register allocation, and compilers are quite good on that. For a few hundreds of lines, you won't have the patience to code the assembler code better than a good optimizing compiler could emit it.
Of course, there might be exceptions (you need to benchmark). The most cost-effective way to add some assembler code is probably to use a few asm instructions inside some C function. GCC has an extended asm facility quite good for that.

Why aren't programs written in Assembly more often? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It seems to be a mainstream opinion that assembly programming takes longer and is more difficult to program in than a higher level language such as C. Therefore it seems to be recommend or assumed that it is better to write in a higher level language for these reasons and for the reason of better portability.
Recently I've been writing in x86 assembly and it has dawned on me that perhaps these reasons are not really true, except perhaps portability. Perhaps it is more of a matter of familiarity and knowing how to write assembly well. I also noticed that programming in assembly is quite different than programming in an HLL. Perhaps a good and experienced assembly programmer could write programs just as easily and as quickly as an experienced C programmer writing in C.
Perhaps it is because assembly programming is quite different than HLLs, and so requires different thinking, methods and ways, which makes it seem very awkward to program in for the unfamiliar, and so gives it its bad name for writing programs in.
If portability isn't an issue, then really, what would C have over a good assembler such as NASM?
Edit:
Just to point out. When you are writing in assembly, you don't have to write just in instruction codes. You can use macros and procedures and your own conventions to make various abstractions to make programs more modular, more maintainable and easier to read. This is where being familiar with how to write good assembly comes in.
Hellо, I am a compiler.
I just scanned thousands of lines of code while you were reading this sentence. I browsed through millions of possibilities of optimizing a single line of yours using hundreds of different optimization techniques based on a vast amount of academic research that you would spend years getting at. I won't feel any embarrassment, not even a slight ick, when I convert a three-line loop to thousands of instructions just to make it faster. I have no shame to go to great lengths of optimization or to do the dirtiest tricks. And if you don't want me to, maybe for a day or two, I'll behave and do it the way you like. I can transform the methods I'm using whenever you want, without even changing a single line of your code. I can even show you how your code would look in assembly, on different processor architectures and different operating systems and in different assembly conventions if you'd like. Yes, all in seconds. Because, you know, I can; and you know, you can't.
P.S. Oh, by the way you weren't using half of the code you wrote. I did you a favor and threw it away.
ASM has poor legibility and isn't really maintainable compared to higher-level languages.
Also, there are many fewer ASM developers than for other more popular languages, such as C.
Furthermore, if you use a higher-level language and new ASM instructions become available (SSE for example), you just need to update your compiler and your old code can easily make use of the new instructions.
What if the next CPU has twice as many registers?
The converse of this question would be: What functionality do compilers provide?
I doubt you can/want to/should optimize your ASM better than gcc -O3 can.
I've written shedloads of assembler for the 6502, Z80, 6809 and 8086 chips. I stopped doing so as soon as C compilers became available for the platforms I was addressing, and immediately became at least 10x more productive. Most good programmers use the tools they use for rational reasons.
I love programming in assembly language, but it takes more code to do the same thing as in a high-level languge, and there is a direct correlation between lines of code and bugs. (This was explained decades ago in The Mythical Man-Month.)
It's possible to think of C as 'high level assembly', but get a few steps above that and you're in a different world. In C# you don't think twice about writing this:
foreach (string s in listOfStrings) { /* do stuff */ }
This would be dozens, maybe hundreds of lines of code in assembly, each programmer implementing it would take a different approach, and the next person coming along would have to figure it out. So if you believe (as many do) that programs are written primarily for other people to read, assembly is less readable than the typical HLL.
Edit: I accumulated a personal library of code used for common tasks, and macros for implementing C-like control structures. But I hit the wall in the 90s, when GUIs became the norm. Too much time was being spent on things that were routine.
The last task I had where ASM was essential was a few years ago, writing code to combat malware. No user interface, so it was all the fun parts without the bloat.
In addition to other people's answers of readability, maintainability, shorter code and therefore fewer bugs, and being much easier, I'll add an additional reason:
program speed.
Yes, in assembly you can hand tune your code to make use of every last cycle and make it as fast as is physically possible. However who has the time? If you write a not-completely-stupid C program, the compiler will do a really good job of optimizing for you. Probably making at least 95% of the optimizations you'd do by hand, without you having to worry about keeping track of any of it. There's definitely a 90/10 kind of rule here, where that last 5% of optimizations will end up taking up 95% of your time. So why bother?
If an average production program has say 100k lines of code, and each line is about 8-12 assembler instructions, that would be 1 million of assembler instructions.
Even if you could write all this by hand at a decent speed (remember, its 8 times more code that you have to write), what happens if you want to change some of the functionality? Understanding something you wrote a few weeks ago out of those 1 million instructions is a nightmare! There's no modules, no classes, no object-oriented design, no frameworks, no nothing. And the amount of similar looking code you have to write for even the simplest things is daunting at best.
Besides, you can't optimize your code nearly as well as a high level language. Where C for example performs an insane number of optimizations because you describe your intent, not only your code, in assembler you only write code, the assembler can't really perform any note-worthy optimizations on your code. What you write is what you get, and trust me, you can't reliably optimize 1 million instructions that you patch and patch as you write it.
Well I have been writing a lot of assembly "in the old days", and I can assure you that I am much more productive when I write programs in a high level language.
A reasonable level of assembler competence is a useful skill, especially if you work at any sort of system level or embedded programming, not so much because you have to write that much assembler, but because sometimes it's important to understand what the box is really doing. If you don't have a low-level understanding of assembler concepts and issues, this can be very difficult.
However, as for actually writing much code in assembler, there are several reasons it's not much done.
There's simply no (almost) need. Except for something like the very early system initialization and perhaps a few assembler fragments hidden in C functions or macros, all very low-level code that might once have been written in assembler can be written in C or C++ with no difficulty.
Code in higher-level languages (even C and C++) condenses functionality into far fewer lines, and there is considerable research showing that the number of bugs correlates with the number of lines of source code. Ie, the same problem, solved in assembler and C, will have more bugs in assembler simply because its longer. The same argument motivates the move to higher level languages such as Perl, Python, etc.
Writing in assembler, you have to deal with every single aspect of the problem, from detailed memory layout, instruction selection, algorithm choices, stack management, etc. Higher level languages take all this away from you, which is why are so much denser in terms of LOC.
Essentially, all of the above are related to the level of abstraction available to you in assembler versus C or some other language. Assembler forces you to make all of your own abstractions, and to maintain them through your own self-discipline, where any mid-level language like C, and especially higher level languages, provide you with abstractions out of the box, as well as the ability to create new ones relatively easily.
As a developer who spends most of his time in the embedded programming world, I would argue that assembly is far from a dead/obsolete language. There is a certain close-to-the-metal level of coding (for example, in drivers) that sometimes cannot be expressed as accurately or efficiently in a higher-level language. We write nearly all of our hardware interface routines in assembler.
That being said, this assembly code is wrapped such that it can be called from C code and is treated like a library. We don't write the entire program in assembly for many reasons. First and foremost is portability; our code base is used on several products that use different architectures and we want to maximize the amount of code that can be shared between them. Second is developer familiarity. Simply put, schools don't teach assembly like they used to, and our developers are far more productive in C than in assembly. Also, we have a wide variety of "extras" (things like libraries, debuggers, static analysis tools, etc) available for our C code that aren't available for assembly language code. Even if we wanted to write a pure-assembly program, we would not be able to because several critical hardware libraries are only available as C libs. In one sense, it's a chicken/egg problem. People are driven away from assembly because there aren't as many libraries and development/debug tools available for it, but the libs/tools don't exist because not enough people use assembly to warrant the effort creating them.
In the end, there is a time and a place for just about any language. People use what they are most familiar and productive with. There will probably always be a place in a programmer's repertoire for assembly, but most programmers will find that they can write code in a higher-level language that is almost as efficient in far less time.
When you are writing in assembly, you don't have to write just in instruction codes. You can use macros and procedures and your own conventions to make various abstractions to make programs more modular, more maintainable and easier to read.
So what you're basically saying is, that with skilled use of a sophisticated assembler, you can make your ASM code closer and closer to C (or anyway another low-ish-level language of your own invention), until eventually you are just as productive as a C programmer.
Does that answer your question? ;-)
I don't say this idly: I have programmed using exactly such an assembler and system. Even better, the assembler could target a virtual processor, and a separate translator compiled the output of the assembler for a target platform. Much as happens with LLVM's IF, but in its early forms pre-dating it by about 10 years. So there was portability, plus the ability to write routines for a specific target asssembler where required for efficiency.
Writing using that assembler was about as productive as C, and with by comparison with GCC-3 (which was around by the time I was involved) the assembler/translator produced code that was roughly as fast and usually smaller. Size was really important, and the company had few programmers and was willing to teach new hires a new language before they could do anything useful. And we had the back-up that people who didn't know the assembler (e.g. customers) could write C and compile it for the same virtual processor, using the same calling convention and so on, so that it interfaced neatly. So it felt like a marginal win.
That was with multiple man-years of work in the bag developing the assembler technology, libraries, and so on. Admittedly much of which went into making it portable, if it had only ever been targeting one architecture then the all-singing all-dancing assembler would have been much easier.
In summary: you may not like C, but it doesn't mean that the effort of using C is greater than the effort of coming up with something better.
Assembly is not portable between different microprocessors.
The same reason we don't go to the bathroom outside anymore, or why we don't speak Latin or Aramaic.
Technology comes along and makes things easier and more accessible.
EDIT - to cease offending people, I've removed certain words.
Why? Simple.
Compare this :
for (var i = 1; i <= 100; i++)
{
if (i % 3 == 0)
Console.Write("Fizz");
if (i % 5 == 0)
Console.Write("Buzz");
if (i % 3 != 0 && i % 5 != 0)
Console.Write(i);
Console.WriteLine();
}
with
.locals init (
[0] int32 i)
L_0000: ldc.i4.1
L_0001: stloc.0
L_0002: br.s L_003b
L_0004: ldloc.0
L_0005: ldc.i4.3
L_0006: rem
L_0007: brtrue.s L_0013
L_0009: ldstr "Fizz"
L_000e: call void [mscorlib]System.Console::Write(string)
L_0013: ldloc.0
L_0014: ldc.i4.5
L_0015: rem
L_0016: brtrue.s L_0022
L_0018: ldstr "Buzz"
L_001d: call void [mscorlib]System.Console::Write(string)
L_0022: ldloc.0
L_0023: ldc.i4.3
L_0024: rem
L_0025: brfalse.s L_0032
L_0027: ldloc.0
L_0028: ldc.i4.5
L_0029: rem
L_002a: brfalse.s L_0032
L_002c: ldloc.0
L_002d: call void [mscorlib]System.Console::Write(int32)
L_0032: call void [mscorlib]System.Console::WriteLine()
L_0037: ldloc.0
L_0038: ldc.i4.1
L_0039: add
L_003a: stloc.0
L_003b: ldloc.0
L_003c: ldc.i4.s 100
L_003e: ble.s L_0004
L_0040: ret
They're identical feature-wise.
The second one isn't even assembler but .NET IL (Intermediary Language, similar to Java's bytecode). The second compilation transforms the IL into native code (i.e. almost assembler), making it even more cryptical.
I'd guess ASM on even x86(_64) makes sense in cases where you gain a lot by utilizing instructions that are difficult for a compiler to optimize for. x264 for example uses a lot of asm for its encoding, and the speed gains are huge.
I'm sure there are many reasons, but two quick reasons I can think of are
Assembly code is definitely harder to read (I'm positive its more time-consuming to write as well)
When you have a huge team of developers working on a product, it is helpful to have your code divided into logical blocks and protected by interfaces.
One of the early discoveries (you'll find it in Brooks' Mythical Man-Month, which is from experience in the 1960s) was that people were more or less as productive in one language as another, in debugged lines of code per day. This obviously isn't universally true, and can breaks when pushed too far, but it was generally true of the high-level languages of Brooks' time.
Therefore, the fastest way to get productivity would be to use languages where one individual line of code did more, and indeed this works, at least for languages of complexity like FORTRAN and COBOL, or to give a more modern example C.
Portability is always an issue -- if not now, at least eventually. The programming industry spends billions every year to port old software which, at the time it was written, had "obviously" no portability issue whatsoever.
There was a vicious cycle as assembly became less commonplace: as higher level languages matured, assembly language instruction sets were built less for programmer convenience and more for the convenience of compilers.
So now, realistically, it may be very hard to make the right decisions on, say, which registers you should use or which instructions are slightly more efficient. Compilers can use heuristics to figure out which tradeoffs are likely to have the best payoff. We can probably think through smaller problems and find local optimizations that might beat our now pretty sophisticated compilers, but odds are that in the average case, a good compiler will do a better job on the first try than a good programmer probably will. Eventually, like John Henry, we might beat the machine, but we might seriously burn ourselves out getting there.
Our problems are also now quite different. In 1986 I was trying to figure out how to get a little more speed out of small programs that involved putting a few hundred pixels on the screen; I wanted the animation to be less jerky. A fair case for assembly language. Now I'm trying to figure out how to represent abstractions around contract language and servicer policy for mortgages, and I'd rather read something that looks close to the language that the business folks speak. Unlike LISP macros, Assembly macros don't enforce much in the way of rules, so even though you might be able to get something reasonably close to a DSL in a good assembler, it'll be prone to all sorts of quirks that won't cause me problems if I wrote the same code in Ruby, Boo, Lisp, C# or even F#.
If your problems are easy to express in efficient assembly language, though, more power to you.
Ditto most of what others have said.
In the good old days before C was invented, when the only high level languages were things like COBOL and FORTRAN, there were lots of things that just weren't possible to do without resorting to assembler. It was the only way to get the full breadth of flexibility, to be able to access all the devices, etc. But then C was invented, and almost anything that was possible in assembly was possible in C. I have written very little assembly since then.
That said, I think it is a very useful exercise for new programmers to learn to write in assembler. Not because they would actually use it much, but because then you understand what is really happening inside the computer. I've seen lots of programming errors and inefficient code from programmers who clearly have no idea what's really happening with the bits and bytes and registers.
I've been programming in assembly now for about a month. I often write a piece of code in C and then compile it to assembly to assist me. Perhaps I am not utilizing the full optimizing power of the C compiler but it appears that my C asm source is including unnecessary operations. So I am beginning to see that the talk of a good C compiler outperforming a good assembly coder is not always true.
Anyways, my assembly programs are so fast. And the more I use assembly the less time it takes me to write out my code because it's really not that hard. Also the comment about assembly having poor legibility is not true. If you label your programs correctly and make comments when there is additional elaboration needed you should be all set. In fact in ways assembly is more clear to the programmer because they are seeing what is happening at the level of the processor. I don't know about other programmers but for me I like knowing what's happening, rather than things being in a sort of black box.
With that said the real advantage of compilers is that a compiler can understand patterns and relationships and then automatically code them in the appropriate locations in the source. One popular example are virtual functions in C++ which requires the compiler to optimally map function pointers. However a compiler is limited to doing what the maker of the compiler allows the compiler to do. This leads to programmers sometimes having to resort to doing bizarre things with their code , adding coding time, when they could have been done trivially with assembly.
Personally I think the marketplace heavily supports high level languages. If assembly language was the only language in existence today then their would be about 70% less people programming and who knows where our world would be, probably back in the 90's. Higher level languages appeal to a broader range of people. This allows a higher supply of programmers to build the needed infrastructure of our world. Developing nations like China and India benefit heavily from languages like Java. These countries will fast develop their IT infrastructure and people will become more interconnected. So my point is that high level languages are popular not because they produce superior code but because they help to meet demand in the world's marketplaces.
I'm learning assembly in comp org right now, and while it is interesting, it is also very inefficient to write in. You have to keep alot more details in your head to get things working, and its also slower to write the same things. For example, a simple 6 line for loop in C++ can equal 18 lines or more of assembly.
Personally, its alot of fun learning how things work down at the hardware level, and it gives me greater appreciation for how computing works.
What C has over a good macro assembler is the language C. Type checking. Loop constructs. Automatic stack management. (Nearly) automatic variable management. Dynamic memory techniques in assembler are a massive pain in the butt. Doing a linked list properly is just down right scary compared to C or better yet list foo.insert(). And debugging - well, there's no contest on what is easier to debug. HLLs win hands down there.
I've coded nearly half my career in assembler which makes it very easy for me to think in assmebler. it helps me to see what the C compiler is doing which again helps me write code that the C compiler can efficiently handle. A well thought out routine written in C can be written to output exactly what you want in assembler with a little work - and it's portable! I've already had to rewrite a few older asm routines back to C for cross platform reasons and it's no fun.
No, I'll stick with C and deal with the occasional slight slowdown in performance against the productivity time I gain with HLL.
I can only answer why I personally don't write programs in assembly more often, and the main reason is that it's more tedious to do. Also, I think that it is easier to get things subtly wrong without noticing immediately. E.g., you might change the way you use a register in one routine but forget to change this in one place. It'll assemble fine and you may not notice until much later.
That said, I do think there are still valid uses for assembly. For instance, I have a number of pretty optimised assembly routines for processing large amounts of data, using SIMD and following the paranoid "every bit is sacred"[quote V.Stob] approach. (But note that naive assembly implementations are often a lot worse than what a compiler would generate for you.)
C is a macro assembler! And it's the best one!
It can do nearly everything assembly can, it can be portable and in most of the rare cases where it can't do something you can still use embedded assembly code. This leaves only a small fraction of programs that you absolutely need to write in assembly and nothing but assembly.
And the higher level abstractions and the portability make it more worthwhile for most people to write system software in C. And although you might not need portability now if you invest a lot of time and money in writing some program you might not want to limit yourself in what you'll be able to use it for in the future.
People seem to forget that there is also the other direction.
Why are you writing in Assembler in the first place? Why not write the program in a truly low level language?
Instead of
mov eax, 0x123
add eax, 0x456
push eax
call printInt
you could just as well write
B823010000
0556040000
50
FF15.....
That has so many advantages, you know the exact size of your program, you can reuse the value of instructions as input for other instructions and you do not even need an assembler to write it, you can use any text editor...
And the reason you still prefer Assembler about this, is the reason other people prefer C...
Because it's always that way: time pass and good things pass away too :(
But when you write asm code it's totally different feeling than when you code high-level langs, though you know it's much less productive. It's like you're a painter: you are free to draw anything you like the way you like with absolutely no restrictions(well, only by CPU features)... That is why I love it. It's a pity this language goes away. But while somebody still remembers it and codes it, it will never die!
$$$
A company hires a developer to help turn code into $$$. The faster that useful code can be produced, the faster the company can turn that code into $$$.
Higher level languages are generally better at churning out larger volumes of useful code. This is not to say that assembly does not have its place, for there are times and places where nothing else will do.
The advantage of HLL's is even greater when you compare assembly to a higher level language than C, e.g. Java or Python or Ruby. For instance, these languages have garbage collection: no need to worry about when to free a chunk of memory, and no memory leaks or bugs due to freeing too early.
As others mentioned before, the reason for any tool to exist is how efficiently it can work. As HLLs can accomplish the same jobs as many lines of asm code I guess it's natural for assembly to be superseded by other languages. And for the close-to-hardware fiddling - there's inline assembly in C and other variants as per language.
Dr. Paul Carter in says in the PC Assembly Language
"...a better understanding of how
computers really work at a lower level
than in programming languages like
Pascal. By gaining a deeper
understanding of how computers work,
the reader can often be much more
productive developing software in
higher level languages such as C and
C++. Learning to program in assembly
language is an excellent way to
achieve this goal."
We've got introduction to assembly in my college courses. It'll help to clear concepts. However I doubt any of us would write 90% of code in assembly. How relevant is in-depth assembly knowledge today?
Flipping through these answers, I'd bet 9/10 of the responders have never worked with assembly.
This is an ages old question that comes up every so often and you get the same, mostly misinformed answers. If it weren't for portability, I'd still do everything in assembly myself. Even then, I code in C almost like I did in assembly.

Why is C so fast, and why aren't other languages as fast or faster? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In listening to the Stack Overflow podcast, the jab keeps coming up that "real programmers" write in C, and that C is so much faster because it's "close to the machine." Leaving the former assertion for another post, what is special about C that allows it to be faster than other languages?
Or put another way: what's to stop other languages from being able to compile down to binary that runs every bit as fast as C?
There isn't much that's special about C. That's one of the reasons why it's fast.
Newer languages which have support for garbage collection, dynamic typing and other facilities which make it easier for the programmer to write programs.
The catch is, there is additional processing overhead which will degrade the performance of the application. C doesn't have any of that, which means that there is no overhead, but that means that the programmer needs to be able to allocate memory and free them to prevent memory leaks, and must deal with static typing of variables.
That said, many languages and platforms, such as Java (with its Java Virtual Machine) and .NET (with its Common Language Runtime) have improved performance over the years with advents such as just-in-time compilation which produces native machine code from bytecode to achieve higher performance.
There is a trade-off the C designers have made. That's to say, they made the decision to put speed above safety. C won't
Check array index bounds
Check for uninitialized variable values
Check for memory leaks
Check for null pointer dereference
When you index into an array, in Java it takes some method call in the virtual machine, bound checking and other sanity checks. That is valid and absolutely fine, because it adds safety where it's due. But in C, even pretty trivial things are not put in safety. For example, C doesn't require memcpy to check whether the regions to copy overlap. It's not designed as a language to program a big business application.
But these design decisions are not bugs in the C language. They are by design, as it allows compilers and library writers to get every bit of performance out of the computer. Here is the spirit of C how the C Rationale document explains it:
C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a ``high-level assembler'': the ability to write machine-specific code is one of the strengths of C.
Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like
Trust the programmer.
Don't prevent the programmer from doing what needs to be done.
Keep the language small and simple.
Provide only one way to do an operation.
Make it fast, even if it is not guaranteed to be portable.
The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule. An example of this willingness to live with what the machine does can be seen in the rules that govern the widening of char objects for use in expressions: whether the values of char objects widen to signed or unsigned quantities typically depends on which byte operation is more efficient on the target machine.
If you spend a month to build something in C that runs in 0.05 seconds, and I spend a day writing the same thing in Java, and it runs in 0.10 seconds, then is C really faster?
But to answer your question, well-written C code will generally run faster than well-written code in other languages because part of writing C code "well" includes doing manual optimizations at a near-machine level.
Although compilers are very clever indeed, they are not yet able to creatively come up with code that competes with hand-massaged algorithms (assuming the "hands" belong to a good C programmer).
Edit:
A lot of comments are along the lines of "I write in C and I don't think about optimizations."
But to take a specific example from this post:
In Delphi I could write this:
function RemoveAllAFromB(a, b: string): string;
var
before, after :string;
begin
Result := b;
if 0 < Pos(a,b) then begin
before := Copy(b,1,Pos(a,b)-Length(a));
after := Copy(b,Pos(a,b)+Length(a),Length(b));
Result := before + after;
Result := RemoveAllAFromB(a,Result); //recursive
end;
end;
and in C I write this:
char *s1, *s2, *result; /* original strings and the result string */
int len1, len2; /* lengths of the strings */
for (i = 0; i < len1; i++) {
for (j = 0; j < len2; j++) {
if (s1[i] == s2[j]) {
break;
}
}
if (j == len2) { /* s1[i] is not found in s2 */
*result = s1[i];
result++; /* assuming your result array is long enough */
}
}
But how many optimizations are there in the C version? We make lots of decisions about implementation that I don't think about in the Delphi version. How is a string implemented? In Delphi I don't see it. In C, I've decided it will be a pointer to an array of ASCII integers, which we call chars. In C, we test for character existence one at a time. In Delphi, I use Pos.
And this is just a small example. In a large program, a C programmer has to make these kinds of low-level decisions with every few lines of code. It adds up to a hand-crafted, hand-optimized executable.
I didn't see it already, so I'll say it: C tends to be faster because almost everything else is written in C.
Java is built on C, Python is built on C (or Java, or .NET, etc.), Perl is, etc. The OS is written in C, the virtual machines are written in C, the compilers are written in C, the interpreters are written in C. Some things are still written in Assembly language, which tends to be even faster. More and more things are being written in something else, which is itself written in C.
Each statement that you write in other languages (not Assembly) is typically implemented underneath as several statements in C, which are compiled down to native machine code. Since those other languages tend to exist in order to obtain a higher level of abstraction than C, those extra statements required in C tend to be focused on adding safety, adding complexity, and providing error handling. Those are often good things, but they have a cost, and its names are speed and size.
Personally, I have written in literally dozens of languages spanning most of the available spectrum, and I personally have sought the magic that you hint at:
How can I have my cake and eat it, too? How can I play with high-level abstractions in my favorite language, then drop down to the nitty gritty of C for speed?
After a couple of years of research, my answer is Python (on C). You might want to give it a look. By the way, you can also drop down to Assembly from Python, too (with some minor help from a special library).
On the other hand, bad code can be written in any language. Therefore, C (or Assembly) code is not automatically faster. Likewise, some optimization tricks can bring portions of higher-level language code close to the performance level of raw C. But, for most applications, your program spends most of its time waiting on people or hardware, so the difference really does not matter.
Enjoy.
There are a lot of questions in there - mostly ones I am not qualified to answer. But for this last one:
what's to stop other languages from being able to compile down to binary that runs every bit as fast as C?
In a word, abstraction.
C is only one or two levels of abstraction away from machine language. Java and the .NET languages are at a minimum three levels of abstraction away from assembler. I'm not sure about Python and Ruby.
Typically, the more programmer toys (complex data types, etc.), the further you are from machine language and the more translation has to be done.
I'm off here and there, but that's the basic gist.
There are some good comments on this post with more details.
It is not so much that C is fast as that C's cost model is transparent. If a C program is slow, it is slow in an obvious way: by executing a lot of statements. Compared with the cost of operations in C, high-level operations on objects (especially reflection) or strings can have costs that are not obvious.
Two languages that generally compile to binaries which are just as fast as C are Standard ML (using the MLton compiler) and Objective Caml. If you check out the benchmarks game you'll find that for some benchmarks, like binary trees, the OCaml version is faster than C. (I didn't find any MLton entries.) But don't take the shootout too seriously; it is, as it says, a game, the the results often reflect how much effort people have put in tuning the code.
C is not always faster.
C is slower than, for example, Modern Fortran.
C is often slower than Java for some things (especially after the JIT compiler has had a go at your code).
C lets pointer aliasing happen, which means some good optimizations are not possible. Particularly when you have multiple execution units, this causes data fetch stalls. Ow.
The assumption that pointer arithmetic works really causes slow bloated performance on some CPU families (PIC particularly!) It used to suck the big one on segmented x86.
Basically, when you get a vector unit, or a parallelizing compiler, C stinks and modern Fortran runs faster.
C programmer tricks, like thunking (modifying the executable on the fly), cause CPU prefetch stalls.
Do you get the drift?
And our good friend, the x86, executes an instruction set that these days bears little relationship to the actual CPU architecture. Shadow registers, load-store optimizers, all in the CPU. So C is then close to the virtual metal. The real metal, Intel don't let you see. (Historically VLIW CPU's were a bit of a bust so, maybe that's no so bad.)
If you program in C on a high-performance DSP (maybe a TI DSP?), the compiler has to do some tricky stuff to unroll the C across the multiple parallel execution units. So in that case, C isn't close to the metal, but it is close to the compiler, which will do whole program optimization. Weird.
And finally, some CPUs (www.ajile.com) run Java bytecodes in hardware. C would a PITA to use on that CPU.
what's to stop other languages from
being able to compile down to binary
that runs every bit as fast as C?
Nothing. Modern languages like Java or .NET languages are oriented more toward programmer productivity rather than performance. Hardware is cheap nowadays. Also compilation to intermediate representation gives a lot of bonuses such as security, portability, etc. The .NET CLR can take advantage of different hardware. For example, you don't need to manually optimize/recompile program to use the SSE instructions set.
I guess you forgot that Assembly language is also a language :)
But seriously, C programs are faster only when the programmer knows what he's doing. You can easily write a C program that runs slower than programs written in other languages that do the same job.
The reason why C is faster is because it is designed in this way. It lets you do a lot of "lower level" stuff that helps the compiler to optimize the code. Or, shall we say, you the programmer are responsible for optimizing the code. But it's often quite tricky and error prone.
Other languages, like others already mentioned, focus more on productivity of the programmer. It is commonly believed that programmer time is much more expensive than machine time (even in the old days). So it makes a lot of sense to minimize the time programmers spend on writing and debugging programs instead of the running time of the programs. To do that, you will sacrifice a bit on what you can do to make the program faster because a lot of things are automated.
The main factors are that it's a statically-typed language and that's compiled to machine code. Also, since it's a low-level language, it generally doesn't do anything you don't tell it to.
These are some other factors that come to mind.
Variables are not automatically initialized
No bounds checking on arrays
Unchecked pointer manipulation
No integer overflow checking
Statically-typed variables
Function calls are static (unless you use function pointers)
Compiler writers have had lots of time to improve the optimizing code. Also, people program in C for the purpose of getting the best performance, so there's pressure to optimize the code.
Parts of the language specification are implementation-defined, so compilers are free to do things in the most optimal way
Most static-typed languages could be compiled just as fast or faster than C though, especially if they can make assumptions that C can't because of pointer aliasing, etc.
C++ is faster on average (as it was initially, largely a superset of C, though there are some differences). However, for specific benchmarks, there is often another language which is faster.
From The Computer Language Benchmarks Game:
fannjuch-redux was fastest in Scala
n-body and fasta were faster in Ada.
spectral-norm was fastest in Fortran.
reverse-complement, mandelbrot and pidigits were fastest in ATS.
regex-dna was fastest in JavaScript.
chameneou-redux was fastest is Java 7.
thread-ring was fastest in Haskell.
The rest of the benchmarks were fastest in C or C++.
For the most part, every C instruction corresponds to a very few assembler instructions. You are essentially writing higher level machine code, so you have control over almost everything the processor does. Many other compiled languages, such as C++, have a lot of simple looking instructions that can turn into much more code than you think it does (virtual functions, copy constructors, etc..) And interpreted languages like Java or Ruby have another layer of instructions that you never see - the Virtual Machine or Interpreter.
I know plenty of people have said it in a long winded way, but:
C is faster because it does less (for you).
Many of these answers give valid reasons for why C is, or is not, faster (either in general or in specific scenarios). It's undeniable that:
Many other languages provide automatic features that we take for granted. Bounds checking, run-time type checking, and automatic memory management, for example, don't come for free. There is at least some cost associated with these features, which we may not think about—or even realize—while writing code that uses these features.
The step from source to machine is often not as direct in other languages as it is in C.
OTOH, to say that compiled C code executes faster than other code written in other languages is a generalization that isn't always true. Counter-examples are easy to find (or contrive).
All of this notwithstanding, there is something else I have noticed that, I think, affects the comparative performance of C vs. many other languages more greatly than any other factor. To wit:
Other languages often make it easier to write code that executes more slowly. Often, it's even encouraged by the design philosophies of the language. Corollary: a C programmer is more likely to write code that doesn't perform unnecessary operations.
As an example, consider a simple Windows program in which a single main window is created. A C version would populate a WNDCLASS[EX] structure which would be passed to RegisterClass[Ex], then call CreateWindow[Ex] and enter a message loop. Highly simplified and abbreviated code follows:
WNDCLASS wc;
MSG msg;
wc.style = 0;
wc.lpfnWndProc = &WndProc;
wc.cbClsExtra = 0;
wc.cbWndExtra = 0;
wc.hInstance = hInstance;
wc.hIcon = NULL;
wc.hCursor = LoadCursor(NULL, IDC_ARROW);
wc.hbrBackground = (HBRUSH)(COLOR_BTNFACE + 1);
wc.lpszMenuName = NULL;
wc.lpszClassName = "MainWndCls";
RegisterClass(&wc);
CreateWindow("MainWndCls", "", WS_OVERLAPPEDWINDOW | WS_VISIBLE,
CW_USEDEFAULT, 0, CW_USEDEFAULT, 0, NULL, NULL, hInstance, NULL);
while(GetMessage(&msg, NULL, 0, 0)){
TranslateMessage(&msg);
DispatchMessage(&msg);
}
An equivalent program in C# could be just one line of code:
Application.Run(new Form());
This one line of code provides all of the functionality that nearly 20 lines of C code did, and adds some things we left out, such as error checking. The richer, fuller library (compared to those used in a typical C project) did a lot of work for us, freeing our time to write many more snippets of code that look short to us but involve many steps behind the scenes.
But a rich library enabling easy and quick code bloat isn't really my point. My point is more apparent when you start examining what actually happens when our little one-liner actually executes. For fun sometime, enable .NET source access in Visual Studio 2008 or higher, and step into the simple one-linef above. One of the fun little gems you'll come across is this comment in the getter for Control.CreateParams:
// In a typical control this is accessed ten times to create and show a control.
// It is a net memory savings, then, to maintain a copy on control.
//
if (createParams == null) {
createParams = new CreateParams();
}
Ten times. The information roughly equivalent to the sum of what's stored in a WNDCLASSEX structure and what's passed to CreateWindowEx is retrieved from the Control class ten times before it's stored in a WNDCLASSEX structure and passed on to RegisterClassEx and CreateWindowEx.
All in all, the number of instructions executed to perform this very basic task is 2–3 orders of magnitude more in C# than in C. Part of this is due to the use of a feature-rich library, which is necessarily generalized, versus our simple C code which does exactly what we need and nothing more. But part of it is due to the fact that the modularized, object-oriented nature of .NET framework, lends itself to a lot of repetition of execution that often is avoided by a procedural approach.
I'm not trying to pick on C# or the .NET framework. Nor am I saying that modularization, generalization, library/language features, OOP, etc. are bad things. I used to do most of my development in C, later in C++, and most lately in C#. Similarly, before C, I used mostly assembly. And with each step "higher" my language goes, I write better, more maintainable, more robust programs in less time. They do, however, tend to execute a little more slowly.
I don't think anyone has mentioned the fact that much more effort has been put into C compilers than any other compiler, with perhaps the exception of Java.
C is extremely optimizable for many of the reasons already stated - more than almost any other language. So if the same amount of effort is put into other language compilers, C will probably still come out on top.
I think there is at least one candidate language that, with effort, could be optimized better than C and thus we could see implementations that produce faster binaries. I'm thinking of Digital Mars' D, because the creator took care to build a language that could potentially be better optimized than C. There may be other languages that have this possibility. However, I cannot imagine that any language will have compilers more than just a few percent faster than the best C compilers. I would love to be wrong.
I think the real "low hanging fruit" will be in languages that are designed to be easy for humans to optimize. A skilled programmer can make any language go faster, but sometimes you have to do ridiculous things or use unnatural constructs to make this happen. Although it will always take effort, a good language should produce relatively fast code without having to obsess over exactly how the program is written.
It's also important (at least to me) that the worst case code tends to be fast. There are numerous "proofs" on the web that Java is as fast or faster than C, but that is based on cherry picking examples.
I'm not big fan of C, but I know that anything I write in C is going to run well. With Java, it will "probably" run within 15% of the speed, usually within 25%, but in some cases it can be far worse. Any cases where it's just as fast or within a couple of percent are usually due to most of the time being spent in the library code which is heavily optimized C code anyway.
This is actually a bit of a perpetuated falsehood. While it is true that C programs are frequently faster, this is not always the case, especially if the C programmer isn't very good at it.
One big glaring hole that people tend to forget about is when the program has to block for some sort of I/O, such as user input in any GUI program. In these cases, it doesn't really matter what language you use since you are limited by the rate at which data can come in rather than how fast you can process it. In this case, it doesn't matter much if you are using C, Java, C# or even Perl; you just cannot go any faster than the data can come in.
The other major thing is that using garbage collection (GC) and not using proper pointers allows the virtual machine to make a number of optimizations not available in other languages. For instance, the JVM is capable of moving objects around on the heap to defragment it. This makes future allocations much faster since the next index can simply be used rather than looking it up in a table. Modern JVMs also don't have to actually deallocate memory; instead, they just move the live objects around when they GC and the spent memory from the dead objects is recovered essentially for free.
This also brings up an interesting point about C and even more so in C++. There is something of a design philosophy of "If you don't need it, you don't pay for it." The problem is that if you do want it, you end up paying through the nose for it. For instance, the vtable implementation in Java tends to be a lot better than C++ implementations, so virtual function calls are a lot faster. On the other hand, you have no choice but to use virtual functions in Java and they still cost something, but in programs that use a lot of virtual functions, the reduced cost adds up.
It's not so much about the language as the tools and libraries. The available libraries and compilers for C are much older than for newer languages. You might think this would make them slower, but au contraire.
These libraries were written at a time when processing power and memory were at a premium. They had to be written very efficiently in order to work at all. Developers of C compilers have also had a long time to work in all sorts of clever optimizations for different processors. C's maturity and wide adoption makes for a signficant advantage over other languages of the same age. It also gives C a speed advantage over newer tools that don't emphasize raw performance as much as C had to.
Amazing to see the old "C/C++ must be faster than Java because Java is interpreted" myth is still alive and kicking. There are articles going back a few years, as well as more recent ones, that explain with concepts or measurements why this simply isn't always the case.
Current virtual machine implementations (and not just the JVM, by the way) can take advantage of information gathered during program execution to dynamically tune the code as it runs, using a variety of techniques:
rendering frequent methods to machine code,
inlining small methods,
adjustment of locking
and a variety of other adjustments based on knowing what the code is actually doing, and on the actual characteristics of the environment in which it's running.
The lack of abstraction is what makes C faster. If you write an output statement you know exactly what is happening. If you write an output statement in Java it is getting compiled to a class file which then gets run on a virtual machine, introducing a layer of abstraction.
The lack of object-oriented features as a part of the language also increases its speed do to less code being generated. If you use C as an object-oriented language, then you are doing all the coding for things such as classes, inheritance, etc. This means rather than make something generalized enough for everyone with the amount of code and the performance penalty that requires you only write what you need to get the job done.
The fastest running code would be carefully handcrafted machine code. Assembler will be almost as good. Both are very low level and it takes a lot of writing code to do things. C is a little above assembler. You still have the ability to control things at a very low level in the actual machine, but there is enough abstraction, make writing it faster and easier then assembler.
Other languages, such as C# and Java, are even more abstract. While Assembler and machine code are called low-level languages, C# and JAVA (and many others) are called high-level languages. C is sometimes called a midlevel language.
Don't take someone’s word for it; look at the disassembly for both C and your language-of-choice in any performance critical part of your code. I think you can just look in the disassembly window at runtime in Visual Studio to see disassembled .NET code. It should be possible, if tricky, for Java using WinDbg, though if you do it with .NET, many of the issues would be the same.
I don't like to write in C if I don't need to, but I think many of the claims made in these answers that tout the speed of languages other than C can be put aside by simply disassembling the same routine in C and in your higher level language of choice, especially if lots of data is involved as is common in performance critical applications. Fortran may be an exception in its area of expertise; I don't know. Is it higher level than C?
The first time I did compare JITed code with native code resolved any and all questions whether .NET code could run comparably to C code. The extra level of abstraction and all the safety checks come with a significant cost. The same costs would probably apply to Java, but don't take my word for it; try it on something where performance is critical. (Does anyone know enough about JITed Java to locate a compiled procedure in memory? It should certainly be possible.)
Setting aside advanced optimization techniques such as hot-spot optimization, pre-compiled meta-algorithms, and various forms of parallelism, the fundamental speed of a language correlates strongly with the implicit behind-the-scenes complexity required to support the operations that would commonly be specified within inner loops.
Perhaps the most obvious is validity checking on indirect memory references—such as checking pointers for null and checking indexes against array boundaries. Most high-level languages perform these checks implicitly, but C does not. However, this is not necessarily a fundamental limitation of these other languages—a sufficiently clever compiler may be capable of removing these checks from the inner loops of an algorithm through some form of loop-invariant code motion.
The more fundamental advantage of C (and to a similar extent the closely related C++) is a heavy reliance on stack-based memory allocation, which is inherently fast for allocation, deallocation, and access. In C (and C++) the primary call stack can be used for allocation of primitives, arrays, and aggregates (struct/class).
While C does offer the capability to dynamically allocate memory of arbitrary size and lifetime (using the so called 'heap'), doing so is avoided by default (the stack is used instead).
Tantalizingly, it is sometimes possible to replicate the C memory allocation strategy within the runtime environments of other programming languages. This has been demonstrated by asm.js, which allows code written in C or C++ to be translated into a subset of JavaScript and run safely in a web browser environment—with near-native speed.
As somewhat of an aside, another area where C and C++ outshine most other languages for speed is the ability to seamlessly integrate with native machine instruction sets. A notable example of this is the (compiler and platform dependent) availability of SIMD intrinsics which support the construction of custom algorithms that take advantage of the now nearly ubiquitous parallel processing hardware—while still utilizing the data allocation abstractions provided by the language (lower-level register allocation is managed by the compiler).
1) As others have said, C does less for you. No initializing variables, no array bounds checking, no memory management, etc. Those features in other languages cost memory and CPU cycles that C doesn't spend.
2) Answers saying that C is less abstracted and therefore faster are only half correct I think. Technically speaking, if you had a "sufficiently advanced compiler" for language X, then language X could approach or equal the speed of C. The difference with C is that since it maps so obviously (if you've taken an architecture course) and directly to assembly language that even a naive compiler can do a decent job. For something like Python, you need a very advanced compiler to predict the probable types of objects and generate machine code on the fly -- C's semantics are simple enough that a simple compiler can do well.
Back in the good ole days, there were just two types of languages: compiled and interpreted.
Compiled languages utilized a "compiler" to read the language syntax and convert it into identical assembly language code, which could than just directly on the CPU. Interpreted languages used a couple of different schemes, but essentially the language syntax was converted into an intermediate form, and then run in a "interpreter", an environment for executing the code.
Thus, in a sense, there was another "layer" -- the interpreter -- between the code and the machine. And, as always the case in a computer, more means more resources get used. Interpreters were slower, because they had to perform more operations.
More recently, we've seen more hybrid languages like Java, that employ both a compiler and an interpreter to make them work. It's complicated, but a JVM is faster, more sophisticated and way more optimized than the old interpreters, so it stands a much better change of performing (over time) closer to just straight compiled code. Of course, the newer compilers also have more fancy optimizing tricks so they tend to generate way better code than they used to as well. But most optimizations, most often (although not always) make some type of trade-off such that they are not always faster in all circumstances. Like everything else, nothing comes for free, so the optimizers must get their boast from somewhere (although often times it using compile-time CPU to save runtime CPU).
Getting back to C, it is a simple language, that can be compiled into fairly optimized assembly and then run directly on the target machine. In C, if you increment an integer, it's more than likely that it is only one assembler step in the CPU, in Java however, it could end up being a lot more than that (and could include a bit of garbage collection as well :-) C offers you an abstraction that is way closer to the machine (assembler is the closest), but you end up having to do way more work to get it going and it is not as protected, easy to use or error friendly. Most other languages give you a higher abstraction and take care of more of the underlying details for you, but in exchange for their advanced functionality they require more resources to run. As you generalize some solutions, you have to handle a broader range of computing, which often requires more resources.
I have found an answer on a link about why some languages are faster and some are slower, I hope this will clear more about why C or C++ is faster than others, There are some other languages also that is faster than C, but we can not use all of them. Some explanation -
One of the big reasons that Fortran remains important is because it's fast: number crunching routines written in Fortran tend to be quicker than equivalent routines written in most other languages. The languages that are competing with Fortran in this space—C and C++—are used because they're competitive with this performance.
This raises the question: why? What is it about C++ and Fortran that make them fast, and why do they outperform other popular languages, such as Java or Python?
Interpreting versus compiling
There are many ways to categorize and define programming languages, according to the style of programming they encourage and features they offer. When looking at performance, the biggest single distinction is between interpreted languages and compiled ones.
The divide is not hard; rather, there's a spectrum. At one end, we have traditional compiled languages, a group that includes Fortran, C, and C++. In these languages, there is a discrete compilation stage that translates the source code of a program into an executable form that the processor can use.
This compilation process has several steps. The source code is analyzed and parsed. Basic coding mistakes such as typos and spelling errors can be detected at this point. The parsed code is used to generate an in-memory representation, which too can be used to detect mistakes—this time, semantic mistakes, such as calling functions that don't exist, or trying to perform arithmetic operations on strings of text.
This in-memory representation is then used to drive a code generator, the part that produces executable code. Code optimization, to improve the performance of the generated code, is performed at various times within this process: high-level optimizations can be performed on the code representation, and lower-level optimizations are used on the output of the code generator.
Actually executing the code happens later. The entire compilation process is simply used to create something that can be executed.
At the opposite end, we have interpreters. The interpreters will include a parsing stage similar to that of the compiler, but this is then used to drive direct execution, with the program being run immediately.
The simplest interpreter has within it executable code corresponding to the various features the language supports—so it will have functions for adding numbers, joining strings, whatever else a given language has. As it parses the code, it will look up the corresponding function and execute it. Variables created in the program will be kept in some kind of lookup table that maps their names to their data.
The most extreme example of the interpreter style is something like a batch file or shell script. In these languages, the executable code is often not even built into the interpreter itself, but rather separate, standalone programs.
So why does this make a difference to performance? In general, each layer of indirection reduces performance. For example, the fastest way to add two numbers is to have both of those numbers in registers in the processor, and to use the processor's add instruction. That's what compiled programs can do; they can put variables into registers and take advantage of processor instructions. But in interpreted programs, that same addition might require two lookups in a table of variables to fetch the values to add, then calling a function to perform the addition. That function may very well use the same processor instruction as the compiled program uses to perform the actual addition, but all the extra work before the instruction can actually be used makes things slower.
If you want to know more please check the source.
Some C++ algorithms are faster than C, and some implementations of algorithms or design patterns in other languages can be faster than C.
When people say that C is fast, and then move on to talking about some other language, they are generally using C's performance as a benchmark.
Just step through the machine code in your IDE, and you'll see why it's faster (if it's faster). It leaves out a lot of hand-holding. Chances are your Cxx can also be told to leave it out too, in which case it should be about the same.
Compiler optimizations are overrated, as are almost all perceptions about language speed.
Optimization of generated code only makes a difference in hotspot code, that is, tight algorithms devoid of function calls (explicit or implicit). Anywhere else, it achieves very little.
With modern optimizing compilers, it's highly unlikely that a pure C program is going to be all that much faster than compiled .NET code, if at all. With the productivity enhancement that frameworks like .NET provide the developer, you can do things in a day that used to take weeks or months in regular C. Coupled with the cheap cost of hardware compared to a developer's salary, it's just way cheaper to write the stuff in a high-level language and throw hardware at any slowness.
The reason Jeff and Joel talk about C being the "real programmer" language is because there isn't any hand-holding in C. You must allocate your own memory, deallocate that memory, do your own bounds-checking, etc. There isn't any such thing as new object(); There isn't any garbage collection, classes, OOP, entity frameworks, LINQ, properties, attributes, fields, or anything like that.
You have to know things like pointer arithmetic and how to dereference a pointer. And, for that matter, know and understand what a pointer is. You have to know what a stack frame is and what the instruction pointer is. You have to know the memory model of the CPU architecture you're working on. There is a lot of implicit understanding of the architecture of a microcomputer (usually the microcomputer you're working on) when programming in C that simply is not present nor necessary when programming in something like C# or Java. All of that information has been off-loaded to the compiler (or VM) programmer.
It's the difference between automatic and manual. Higher-level languages are abstractions, thus automated. C/C++ are manually controlled and handled; even error checking code is sometimes a manual labor.
C and C++ are also compiled languages which means none of that run-everywhere business. These languages have to be fine-tuned for the hardware you work with, thus adding an extra layer of gotcha. Though this is slightly phasing out now as C/C++ compilers are becoming more common across all platforms. You can do cross compilations between platforms. It's still not a run everywhere situation, and you’re basically instructing compiler A to compile against compiler B the same code on a different architecture.
Bottom line, C languages are not meant to be easy to understand or reason. This is also why they’re referred to as systems languages. They came out before all this high-level abstraction nonsense. This is also why they are not used for front end web programming. They’re just not suited to the task; they’re meant to solve complex problems that can't be resolved with conventional language tooling.
This is why you get crazy stuff, like micro-architectures, drivers, quantum physics, AAA games, and operating systems. There are things C and C++ are just well suited for. Speed and number crunching being the chief areas.
C is fast because it is natively compiled, low-level language. But C is not the fastest. The Recursive Fibonacci Benchmark shows that Rust, Crystal, and Nim can be faster.

Why do you program in assembly? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a question for all the hardcore low level hackers out there. I ran across this sentence in a blog. I don't really think the source matters (it's Haack if you really care) because it seems to be a common statement.
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
As far as the assembly goes - is the code written in assembly because you don't want a compiler emitting extra instructions or using excessive bytes, or are you using better algorithms that you can't express in C (or can't express without the compiler mussing them up)?
I completely get that it's important to understand the low-level stuff. I just want to understand the why program in assembly after you do understand it.
I think you're misreading this statement:
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
Games (and most programs these days) aren't "written in assembly" the same way they're "written in C++". That blog isn't saying that a significant fraction of the game is designed in assembly, or that a team of programmers sit around and develop in assembly as their primary language.
What this really means is that developers first write the game and get it working in C++. Then they profile it, figure out what the bottlenecks are, and if it's worthwhile they optimize the heck out of them in assembly. Or, if they're already experienced, they know which parts are going to be bottlenecks, and they've got optimized pieces sitting around from other games they've built.
The point of programming in assembly is the same as it always has been: speed. It would be ridiculous to write a lot of code in assembler, but there are some optimizations the compiler isn't aware of, and for a small enough window of code, a human is going to do better.
For example, for floating point, compilers tend to be pretty conservative and may not be aware of some of the more advanced features of your architecture. If you're willing to accept some error, you can usually do better than the compiler, and it's worth writing that little bit of code in assembly if you find that lots of time is spent on it.
Here are some more relevant examples:
Examples from Games
Article from Intel about optimizing a game engine using SSE intrinsics. The final code uses intrinsics (not inline assembler), so the amount of pure assembly is very small. But they look at the assembler output by the compiler to figure out exactly what to optimize.
Quake's fast inverse square root. Again, the routine doesn't have assembler in it, but you need to know something about architecture to do this kind of optimization. The authors know what operations are fast (multiply, shift) and which are slow (divide, sqrt). So they come up with a very tricky implementation of square root that avoids the slow operations entirely.
High-Performance Computing
Outside the domain of games, people in scientific computing frequently optimize the crap out of things to get them to run fast on the latest hardware. Think of this as games where you can't cheat on the physics.
A great recent example of this is Lattice Quantum Chromodynamics (Lattice QCD). This paper describes how the problem pretty much boils down to one very small computational kernel, which was optimized heavily for PowerPC 440's on an IBM Blue Gene/L. Each 440 has two FPUs, and they support some special ternary operations that are tricky for compilers to exploit. Without these optimizations, Lattice QCD would've run much slower, which is costly when your problem requires millions of CPU hours on expensive machines.
If you are wondering why this is important, check out the article in Science that came out of this work. Using Lattice QCD, these guys calculated the mass of a proton from first principles, and showed last year that 90% of the mass comes from strong force binding energy, and the rest from quarks. That's E=mc2 in action. Here's a summary.
For all of the above, the applications are not designed or written 100% in assembly -- not even close. But when people really need speed, they focus on writing the key parts of their code to fly on specific hardware.
I have not coded in assembly language for many years, but I can give several reasons that I frequently saw:
Not all compilers can make use of certain CPU optimizations and instruction set (e.g., the new instruction sets that Intel adds once in a while). Waiting for compiler writers to catch up means losing a competitive advantage.
Easier to match actual code to known CPU architecture and optimization. For example, things you know about the fetching mechanism, caching, etc. This is supposed to be transparent to the developer, but the fact is that it is not, that's why compiler writers can optimize.
Certain hardware level accesses are only possible/practical via assembly language (e.g., when writing device driver).
Formal reasoning is sometimes actually easier for the assembly language than for the high-level language since you already know what the final or almost final layout of the code is.
Programming certain 3D graphic cards (circa late 1990s) in the absence of APIs was often more practical and efficient in assembly language, and sometimes not possible in other languages. But again, this involved really expert-level games based on the accelerator architecture like manually moving data in and out in certain order.
I doubt many people use assembly language when a higher-level language would do, especially when that language is C. Hand-optimizing large amounts of general-purpose code is impractical.
There is one aspect of assembler programming which others have not mentioned - the feeling of satisfaction you get knowing that every single byte in an application is the result of your own effort, not the compiler's. I wouldn't for a second want to go back to writing whole apps in assembler as I used to do in the early 80s, but I do miss that feeling sometimes...
Usually, a layman's assembly is slower than C (due to C's optimization) but many games (I distinctly remember Doom) had to have specific sections of the game in Assembly so it would run smoothly on normal machines.
Here's the example to which I am referring.
I started professional programming in assembly language in my very first job (80's). For embedded systems the memory demands - RAM and EPROM - were low. You could write tight code that was easy on resources.
By the late 80's I had switched to C. The code was easier to write, debug and maintain. Very small snippets of code were written in assembler - for me it was when I was writing the context switching in an roll-your-own RTOS. (Something you shouldn't do anymore unless it is a "science project".)
You will see assembler snippets in some Linux kernel code. Most recently I've browsed it in spinlocks and other synchronization code. These pieces of code need to gain access to atomic test-and-set operations, manipulating caches, etc.
I think you would be hard pressed to out-optimize modern C compilers for most general programming.
I agree with #altCognito that your time is probably better spent thinking harder about the problem and doing things better. For some reason programmers often focus on micro-efficiencies and neglect the macro-efficiencies. Assembly language to improve performance is a micro-efficiency. Stepping back for a wider view of the system can expose the macro problems in a system. Solving the macro problems can often yield better performance gains.
Once the macro problems are solved then collapse to the micro level.
I guess micro problems are within the control of a single programmer and in a smaller domain. Altering behavior at the macro level requires communication with more people - a thing some programmers avoid. That whole cowboy vs the team thing.
"Yes". But, understand that for the most part the benefits of writing code in assembler are not worth the effort. The return received for writing it in assembly tends to be smaller than the simply focusing on thinking harder about the problem and spending your time thinking of a better way of doing thigns.
John Carmack and Michael Abrash who were largely responsible for writing Quake and all of the high performance code that went into IDs gaming engines go into this in length detail in this book.
I would also agree with Ólafur Waage that today, compilers are pretty smart and often employ many techniques which take advantage of hidden architectural boosts.
These days, for sequential codes at least, a decent compiler almost always beats even a highly seasoned assembly-language programmer. But for vector codes it's another story. Widely deployed compilers don't do such a great job exploiting the vector-parallel capabilities of the x86 SSE unit, for example. I'm a compiler writer, and exploiting SSE tops my list of reasons to go on your own instead of trusting the compiler.
SSE code works better in assembly than compiler intrinsics, at least in MSVC. (i.e. does not create extra copies of data )
I've three or four assembler routines (in about 20 MB source) in my sources at work. All of them are SSE(2), and are related to operations on (fairly large - think 2400x2048 and bigger) images.
For hobby, I work on a compiler, and there you have more assembler. Runtime libraries are quite often full of them, most of them have to do with stuff that defies the normal procedural regime (like helpers for exceptions etc.)
I don't have any assembler for my microcontroller. Most modern microcontrollers have so much peripheral hardware (interrupt controled counters, even entire quadrature encoders and serial building blocks) that using assembler to optimize the loops is often not needed anymore. With current flash prices, the same goes for code memory. Also there are often ranges of pin-compatible devices, so upscaling if you systematically run out of cpu power or flash space is often not a problem
Unless you really ship 100000 devices and programming assembler makes it possible to really make major savings by just fitting in a flash chip a category smaller. But I'm not in that category.
A lot of people think embedded is an excuse for assembler, but their controllers have more CPU power than the machines Unix was developed on. (Microchip coming
with 40 and 60 MIPS microcontrollers for under USD 10).
However a lot people are stuck with legacy, since changing microchip architecture is not easy. Also the HLL code is very architecture dependent (because it uses the hardware periphery, registers to control I/O, etc). So there are sometimes good reasons to keep maintaining a project in assembler (I was lucky to be able to setup affairs on a new architecture from scratch). But often people kid themselves that they really need the assembler.
I still like the answer a professor gave when we asked if we could use GOTO (but you could read that as ASSEMBLER too): "if you think it is worth writing a 3 page essay on why you need the feature, you can use it. Please submit the essay with your results. "
I've used that as a guiding principle for lowlevel features. Don't be too cramped to use it, but make sure you motivate it properly. Even throw up an artificial barrier or two (like the essay) to avoid convoluted reasoning as justification.
Some instructions/flags/control simply aren't there at the C level.
For example, checking for overflow on x86 is the simple overflow flag. This option is not available in C.
Defects tend to run per-line (statement, code point, etc.); while it's true that for most problems, assembly would use far more lines than higher level languages, there are occasionally cases where it's the best (most concise, fewest lines) map to the problem at hand. Most of these cases involve the usual suspects, such as drivers and bit-banging in embedded systems.
If you were around for all the Y2K remediation efforts, you could have made a lot of money if you knew Assembly. There's still plenty of legacy code around that was written in it, and that code occasionally needs maintenance.
Another reason could be when the available compiler just isn't good enough for an architecture and the amount of code needed in the program is not that long or complex as for the programmer to get lost in it. Try programming a microcontroller for an embedded system, usually assembly will be much easier.
Beside other mentioned things, all higher languages have certain limitations. Thats why some people choose to programm in ASM, to have full control over their code.
Others enjoy very small executables, in the range of 20-60KB, for instance check HiEditor, which is implemented by author of the HiEdit control, superb powerfull edit control for Windows with syntax highlighting and tabs in only ~50kb). In my collection I have more then 20 such gold controls from Excell like ssheets to html renders.
I think a lot of game developers would be surprised at this bit of information.
Most games I know of use as little assembly as at all possible. In some cases none at all, and at worst, one or two loops or functions.
That quote is over-generalized, and nowhere near as true as it was a decade ago.
But hey, mere facts shouldn't hinder a true hacker's crusade in favor of assembly. ;)
If you are programming a low end 8 bit microcontroller with 128 bytes of RAM and 4K of program memory you don't have much choice about using assembly. Sometimes though when using a more powerful microcontroller you need a certain action to take place at an exact time. Assembly language comes in useful then as you can count the instructions and so measure the clock cycles used by your code.
Games are pretty performance hungry and although in the meantime the optimizers are pretty good a "master programmer" is still able to squeeze out some more performance by hand coding the right parts in assembly.
Never ever start optimizing your program without profiling it first. After profiling should be able to identify bottlenecks and if finding better algorithms and the like don't cut it anymore you can try to hand code some stuff in assembly.
Aside from very small projects on very small CPUs, I would not set out to ever program an entire project in assembly. However, it is common to find that a performance bottleneck can be relieved with the strategic hand coding of some inner loops.
In some cases, all that is really required is to replace some language construct with an instruction that the optimizer cannot be expected to figure out how to use. A typical example is in DSP applications where vector operations and multiply-accumulate operations are difficult for an optimizer to discover, but easy to hand code.
For example certain models of the SH4 contain 4x4 matrix and 4 vector instructions. I saw a huge performance improvement in a color correction algorithm by replacing equivalent C operations on a 3x3 matrix with the appropriate instructions, at the tiny cost of enlarging the correction matrix to 4x4 to match the hardware assumption. That was achieved by writing no more than a dozen lines of assembly, and carrying matching adjustments to the related data types and storage into a handful of places in the surrounding C code.
It doesn't seem to be mentioned, so I thought I'd add it: in modern games development, I think at least some of the assembly being written isn't for the CPU at all. It's for the GPU, in the form of shader programs.
This might be needed for all sorts of reasons, sometimes simply because whatever higher-level shading language used doesn't allow the exact operation to be expressed in the exact number of instructions wanted, to fit some size-constraint, speed, or any combination. Just as usual with assembly-language programming, I guess.
Almost every medium-to-large game engine or library I've seen to date has some hand-optimized assembly versions available for matrix operations like 4x4 matrix concatenation. It seems that compilers inevitably miss some of the clever optimizations (reusing registers, unrolling loops in a maximally efficient way, taking advantage of machine-specific instructions, etc) when working with large matrices. These matrix manipulation functions are almost always "hotspots" on the profile, too.
I've also seen hand-coded assembly used a lot for custom dispatch -- things like FastDelegate, but compiler and machine specific.
Finally, if you have Interrupt Service Routines, asm can make all the difference in the world -- there are certain operations you just don't want occurring under interrupt, and you want your interrupt handlers to "get in and get out fast"... you know almost exactly what's going to happen in your ISR if it's in asm, and it encourages you to keep the bloody things short (which is good practice anyway).
I have only personally talked to one developer about his use of assembly.
He was working on the firmware that dealt with the controls for a portable mp3 player.
Doing the work in assembly had 2 purposes:
Speed: delays needed to be minimal.
Cost: by being minimal with the code, the hardware needed to run it could be slightly less powerful. When mass-producing millions of units, this can add up.
The only assembler coding I continue to do is for embedded hardware with scant resources. As leander mentions, assembly is still well suited to ISRs where the code needs to be fast and well understood.
A secondary reason for me is to keep my knowledge of assembly functional. Being able to examine and understand the steps which the CPU is taking to do my bidding just feels good.
Last time I wrote in assembler was when I could not convince the compiler to generate libc-free, position independent code.
Next time will probably be for the same reason.
Of course, I used to have other reasons.
A lot of people love to denigrate assembly language because they've never learned to code with it and have only vaguely encountered it and it has left them either aghast or somewhat intimidated. True talented programmers will understand that it is senseless to bash C or Assembly because they are complimentary. in fact the advantage of one is the disadvantage of the other. The organized syntaxic rules of C improves clarity but at the same gives up all the power assembly has from being free of any structural rules ! C code instruction are made to create non-blocking code which could be argued forces clarity of programming intent but this is a power loss. In C the compiler will not allow a jump inside an if/elseif/else/end. Or you are not allowed to write two for/end loops on diferent variables that overlap each other, you cannot write self modifying code (or cannot in an seamless easy way), etc.. conventional programmers are spooked by the above, and would have no idea how to even use the power of these approaches as they have been raised to follow conventional rules.
Here is the truth : Today we have machine with the computing power to do much more that the application we use them for but the human brain is too incapable to code them in a rule free coding environment (= assembly) and needs restrictive rules that greatly reduce the spectrum and simplifies coding.
I have myself written code that cannot be written in C code without becoming hugely inefficient because of the above mentionned limitations. And i have not yet talked about speed which most people think is the main reason for writting in assembly, well it is if you mind is limited to thinking in C then you are the slave of you compiler forever. I always thought chess players masters would be ideal assembly programmers while the C programmers just play "Dames".
No longer speed, but Control. Speed will sometimes come from control, but it is the only reason to code in assembly. Every other reason boils down to control (i.e. SSE and other hand optimization, device drivers and device dependent code, etc.).
If I am able to outperform GCC and Visual C++ 2008 (known also as Visual C++ 9.0) then people will be interested in interviewing me about how it is possible.
This is why for the moment I just read things in assembly and just write __asm int 3 when required.
I hope this help...
I've not written in assembly for a few years, but the two reasons I used to were:
The challenge of the thing! I went through a several-month period years
ago when I'd write everything in x86 assembly (the days of DOS and Windows
3.1). It basically taught me a chunk of low level operations, hardware I/O, etc.
For some things it kept size small (again DOS and Windows 3.1 when writing TSRs)
I keep looking at coding assembly again, and it's nothing more than the challenge and joy of the thing. I have no other reason to do so :-)
I once took over a DSP project which the previous programmer had written mostly in assembly code, except for the tone-detection logic which had been written in C, using floating-point (on a fixed-point DSP!). The tone detection logic ran at about 1/20 of real time.
I ended up rewriting almost everything from scratch. Almost everything was in C except for some small interrupt handlers and a few dozen lines of code related to interrupt handling and low-level frequency detection, which runs more than 100x as fast as the old code.
An important thing to bear in mind, I think, is that in many cases, there will be much greater opportunities for speed enhancement with small routines than large ones, especially if hand-written assembler can fit everything in registers but a compiler wouldn't quite manage. If a loop is large enough that it can't keep everything in registers anyway, there's far less opportunity for improvement.
The Dalvik VM that interprets the bytecode for Java applications on Android phones uses assembler for the dispatcher. This movie (about 31 minutes in, but its worth watching the whole movie!) explains how
"there are still cases where a human can do better than a compiler".
I don't, but I've made it a point to at least try, and try hard at some point in the furture (soon hopefully). It can't be a bad thing to get to know more of the low level stuff and how things work behind the scenes when I'm programming in a high level language. Unfortunately time is hard to come by with a full time job as a developer/consultant and a parent. But I will give at go in due time, that's for sure.

Resources