C: performance of assignments, binary operations, et cetera

C: performance of assignments, binary operations, et cetera - c

I've heard many things about performance in C; casting is slow compared to normal assignments, functional call is slow, binary operation are much faster than normal operations, et cetera...
I'm sure some of those things are specific to the architecture, and compiler optimization might make a huge difference, but I would like to see a chart to get a general idea what I should do and what I should avoid to write high-performance programs. Is there such a chart (or a website, a book, anything) ?

Basically, no. There is no such "tips and tricks" book from the syntax level, because there is no sure-fire guarantee that anything you stated is true (in fact, most of it is false).
In general, performance tuning should focus more on algorithms, followed by memory locality and cache optimizations. The best tools you will have are profilers (oprofile, valgrind, cachegrind, etc) followed by an understanding of machine architecture (instructions combinations which are suboptimal, alignment restrictions, memory hierarchy and size) and assembly language for your CPU (to catch less than optimal inner loop problems).
If you are interested in micro-optimizations on the Intel architecture (and all Intel compatible CPUs), this is a must read (PDF). There are more interesting guides on Agner's website.

It seems to me that you're very confused by all this. Let's address some of these myths that you've dragged up.
Casting is slow compared to normal assignments.
That really depends on what you're casting. Between different address types, no; casting is actually free there as you're just applying a different interpretation to the same value. Casting between different widths of numeric type can be a bit slower (and sometimes is done implicitly on assignment) but is still very fast.
Function calls are slow.
Not really. They're not free, but the cost is not high enough that you should avoid them unless you've got profiling data that says otherwise. Never optimize without a very good reason to do so and proof that it will help. (For the record I've been known to revert attempted optimizations that did not have the balance of performance gains I wanted.)
Binary operations are faster than normal operations.
What's a “normal operation”? FWIW, addition is a binary operation. So is multiplication. On modern hardware, they're both pretty fast. Let the compiler worry about that. It's far more important that you focus on describing what you're doing correctly.
Now, for things that really cost:
I/O.
Memory allocation.
Memory copies.
Deeply nested (or very long) loops.
Keep your eyes on those; they're where software usually gets slow. And always pick good algorithms and data structures.

Once upon a time there was a book named Efficient C. Somewhat later, there was a book named Efficient C/C++ Programming: Smaller, Faster, Better. More recently still, was one named Efficient C++.
All of those cover a lot of the kinds of things that seem to interest you. The first two appear to be out of print, and the third probably should be. To remain correct and meaningful, such a chart would probably have to be updated around once a month. Almost anything you think you along such lines is probably wrong to start with, and the little that is right will probably become wrong fairly soon anyway.
Just for example, you still routinely see recommendations that if you care about performance you should avoid floating point. At one time, this was even reasonable -- but nowadays, some CPUs actually do integer math by converting the integer to floating point, doing the math, then converting the result back to an integer! Using floating point throughout can improve speed.

Basically all the operations you mention are very, very fast. Unless you are doing them millions of times each second, don't worry too much about the minimal differences between the alternatives.
If you have a time critical part of your program that is running too slow, profile it to find out where exactly the time is spent and where it makes sense to optimize.

Where do you hear these things?? Of all the myths that "go viral" in this field, that is possibly the most amazing one I've heard.
C is as close as you can get to machine language and still be a machine-independent "high-level" language.
All the other answers are right.
I would only add, in real software (not little two-page programs) excess generality, over-abstraction, killing flies with bazookas, are the overwhelming cause of poor performance, even though every last programmer considers his/her solution "simple".

I've heard many things about performance in C...
Someone has given you some very strange ideas. I especially like the distinction between "binary" and "normal" operations. I thought that for a computer, binary was normal. Someone will have to explain this distinction to me.
I would like to see a chart to get a general idea what I should do and what I should avoid to write high-performance programs.
I provide you with a chart below. It assumes you have informed yourself about the C language at the level of Kernighan and Ritchie, which is the classic textbook on C and the only C book you will ever need (although others are useful).
Have you read Jon Bentley's book "Programming Pearls"? --no--> read it
|
| yes
V
Have you read Peter van der Linden's book
"Expert C Programming: Deep C Secrets"? --no--> read it
|
| yes
V
Have you learned how to use valgrind --tool=callgrind
and the kcachegrind visualizer? --no--> learn them
|
| yes
V
Congratulations! You are now equipped to write
reasonably efficient C programs.
Most of the topics in Bentley's book, especially algorithms, are worth pursuing in greater depth elsewhere. But this chart will be a painless and entertaining way to get started.

Related

Handing out addresses in a custom memory allocator w.r.t. cache conflicts

I've spent my afternoon reading up on processor caches after reading about the effect power of twos can have on cache conflicts. Now I wish to apply this new knowledge to my memory allocator for multi-threaded programs. However, I don't fully understand it yet.
I was under the impression that processors loved powers of two, so my allocator rounds requested sizes to their next power of two and then slices pages into multiples of this size and hands them out. When a page is full, it simply maps a new page and slices it up the same way. This leads to very similar and predictable offsets into pages.
To what extent should I adapt my allocator to avoid this issue? For example, should I try to randomize addresses slightly or am I screwed for using powers of two in the first place?
Thanks!

Until you have uncontrovertible proof that this is performance critical, just leave it be. The extra complication will most probably not be worth it.
Everybody should read (and understand!) Bentley's "Writing efficient programs" (sadly out of print now, his "Programming Pearls" contains a summary, and is well worth a read too).
Before embarking on a code-optimization bout, make sure it is worth it. If the performance is adequate, there are better uses of your time. Yes, you have to measure first.
Measure where the cost is being spent. Programmers are notoriously bad at guessing where the costs are
The most performance gains come from restating the problem (sometimes it is enough to solve a problem that is faster to solve), then overall organization of the system, next better algorithms/data structures; and at the very, very end detail optimizations like the one considered here.
Your friendly compiler, given a bit of prodding in the direction of "generate good code" will today generate much better code than an experienced assembly language programmer when given similar (full function scale) tasks. Most local source code reorganizations "for performance" are either moot (the compiler would have done so on its own) or deleterious (the compiler will recognize and rewrite the usual code sequences, unusual code can confuse it to do nothing or generate bad code).
Programmer time (writing, debugging, maintaining) is much more valuable than a few microseconds of computer time here and there, except for extremely unusual circumstances. Write the simplest code that does the job, rework only if experience shows it is worthwile.

Higher level languages with C functions

Maybe this has been asked before, but I couldn't find it. My question is simple: Does it make sense to write an application in higher level languages (Java, C#, Python) and time/performance-critical functions in C? Or at this point unless you do very low level OS/game/sensor programming it is all the same to have a full, say, Java application?

It makes sense if you a) notice a performance issue, AND b) use performance measurements to locate where the problem occurs, AND c) can't achieve the desired performance by modifying the existing code.
If any of these items don't apply, then it's probably premature optimization.

If you are fluent and productive in a higher level language such a Python and Lua, then by all means start writing in that language. Look for bottlenecks if and when they exist.

speed can be quite similar with things like C#.
What is tricky is latency. So if you want to write something which you know takes < 10ms then C is reasonably predictable (ignoring whatever variability your operating system might introduce).
Having said that for very tight long loops (image processing for example), things like C/C++ can offer some speed up. You can get quite reasonable performance out of C#, you do have to be careful how you program it though, but I have found in general, you can still squeeze more out of C/C++

Usually your preferred language will do whatever you need it to in acceptable time (er, blazing fast).
Sure, critical time/performance functions can be written in a "more optimal/suitable" language like C or assembly - but whether it will actually make things faster is another story. There are laws that govern how much actual/overall speed-up that you'll get, specifically Amdahs Law and (diminishing returns) .
To answer your question, it only makes sense to rewrite these critical functions in lower languages if there is good enough speed-up to warrant the extra work.

I suggest you read Cliff Click's excellent Java vs. C Performance....Again.. It outlines many points of comparison between Java and C++.
Predictably, the conclusion is that it depends, but it's a worthwhile read.

You can only really answer this on a case by case basis and without reference to what you are doing it's impossible to answer.
But maybe what you actually want here is some kind of sanity check to ensure that this approach isn't crazy to consider. I have worked on tools ranging from a very large graphics application (~ million lines) to relatively small physical simulation engines (~10000 lines) there were written just as you describe: Python on the outside for interface (both for API and for GUI), C/C++ on the inside for the heavy lifting. They all benefited from this division of responsibility.

This is done especially with scripting languages. Things that come to mind are games made in Python. Most of the time Python is too slow for some of the more number crunching aspects of games and they make this a C module for speed. Be sure that you actually need the speed though and that number crunching is your performance bottleneck and not a general algorithm issue. Doing a brute-force search over a list is going to be slow in both C and Python.

I'd say that it depends a lot on your application.
What sort of performance is important? short startup-time? high throughput? low latency? Is it important that response time is always predictable?
Is the application short lived or does it run for long periods of time?
Java can give you high throughput, but occasional short freezes while doing garbage collection. C# is probably similar.
Python, well the performance there will often lag behind the others for anything not written in C (some things ARE written in C, even if you didn't do it yourself).
So as others said. It depends.
But as always with performance: Measure first, optimize when you know you need to.

What specific examples are there of knowing C making you a better high level programmer?

I know about the existance of question such as this one and this one. Let me explain.
Afet reading Joel's article Back to Basics and seeing many similar questions on SO, I've begun to wonder what are specific examples of situations where knowing stuff like C can make you a better high level programmer.
What I want to know is if there are many examples of this. Many times, the answer to this question is something like "Knowing C gives you a better feel of what's happening under the covers" or "You need a solid foundation for your program", and these answers don't have much meaning. I want to understand the different specific ways in which you will benefit from knowing low level concepts,
Joel gave a couple of examples: Binary databases vs XML, and strings. But two examples don't really justify learning C and/or Assembly. So my question is this: What specific examples are there of knowing C making you a better high level programmer?

My experience with teaching students and working with people who only studied high-level languages is that they tend to think at a certain high level of abstraction, and they assume that "everything comes for free". They can become very competent programmers, but eventually they have to deal with some code that has performance issues and then it comes to bite them.
When you work a lot with C, you do think about memory allocation. You often think about memory layout (and cache locality if that's an issue). You understand how and why certain graphics operations just cost a lot. How efficient or inefficient certain socket behaviors are. How buffers work, etc. I feel that using the abstractions in a higher level language when you do know how it is implemented below the covers sometimes gives you "that extra secret sauce" when thinking about performance.
For example, Java has a garbage collector and you can't directly assign things to memory directly. And yet, you can make certain design choices (e.g., with custom data structures) that affect performance because of the same reasons this would be an issue in C.
Also, and more generally, I feel that it is important for a power programmer to not only know big-O notation (which most schools teach), but that in real-life applications the constant is also important (which schools try to ignore). My anecdotal experience is that people with skills in both language levels tend to have a better understanding of the constant, perhaps because of what I described above.
In addition, many higher level systems that I have seen interface with lower level libraries and infrastructures. For instance, some communications, databases or graphics libraries. Some drivers for certain devices, etc. If you are a power programmer, you may eventially have to venture out there and it helps to at least have an idea of what is going on.

Knowing low level stuff can help a lot.
To become a racing driver, you have to learn and understand the basic physics of how tyres grip the road. Anyone can learn to drive pretty fast, but you need a good understanding of the "low level" stuff (forces and friction, racing lines, fine throttle and brake control, etc) to get those last few percent of performance that will allow you to win the race.
For example, if you understand how the CPU architecture works in your computer, you can write code that works better with it (e.g. if you know you have a certain CPU cache size or a certain number of bytes in each CPU cache line, you can arrange your data structures and the way that you access them to make the best use of the cache - for example, processing many elements of an array in order is often faster than processing random elements, due to the CPU cache). If you have a multi-core computer, then understanding how low level techniques like threading work can gave huge benefits (just as not understanding the low level can lead to disaster in threading).
If you understand how Disk I/O and caching works, you can modify file operations to work well with it (e.g. if you read from one file and write to another, working on large batches of data in RAM can help reduce I/O contention between the reading and writing phases of your code, and vastly improve throughput)
If you understand how virtual functions work, you can design high-level code that uses virtual functions well. If used incorrectly they can severely hamper performance.
If you understand how drawing is handled, you can use clever tricks to improve drawing speed. e.g. You can draw a chessboard by alternately drawing 64 white and black squares. But it is often faster to draw 32 white sqares and then 32 black ones (because you only have to change the drawing colour twice instead of 64 times). But you can actually draw the whole board black, then XOR 4 stripes across the board and 4 stripes down the board in white, and this can be much faster still (2 colour changes, and only 9 rectangles to draw instead of 64). This chessboard trick teaches you a very important programming skill: Lateral thinking. By designing your algorithm well, you can often make a big difference to how well your program operates.

Understanding C, or for that matter, any low level programming language, gives you an opportunity to understand things like memory usage (i.e. why is it a bad thing to create several million heavy objects), how pointers/object references work, etc.
The problem is that as we've created ever increasing levels of abstraction, we find ourselves doing a lot of 'lego block' programming, without understanding how the legos actually function. And by having almost infinite resources, we start treating memory and resources like water, and tend to solve problems by throwing more iron at the situation.
While not limited to C, there's a tremendous benefit to working at a low level with much smaller, memory constrained systems like the Arduino or old-school 8-bit processors. It lets you experience close to the metal coding in a much more approachable package, and after spending time squeezing apps into 512K, you will find yourself applying these skills at a larger level within your day to day programming.
So the language itself is not important, but having a deeper appreciation for how all of the bits come together, and how to work effectively at a level closer to the hardware is a set of skills beneficial to any software developer.

For one, knowing C helps you understand how memory works in the OS and in other high level languages. When your C# or Java program balloons on memory usage, understanding that references (which are basically just pointers) take memory too, and understand how many of the data structures are implemented (which you get from making your own in C) helps you understand that your dictionary is reserving huge amounts of memory that aren't actually used.
For another, knowing C can help you understand how to make use of lower level operating system features. You don't need this often, but sometimes you may need memory mapped files, or to use marshalling in C#, and C will greatly help understand what you're doing when that happens.
I think C has also helped my understanding of network protocols, but I can't put my finger on specific examples. I was reading another SO question the other day where someone was complaining about how C's bit-fields are 'basically useless' and I was thinking how elegantly C bit fields represent low-level network protocols. High level languages dealing with structures of bits always end up a mess!

In general, the more you know, the better programmer you will be.
However, sometimes knowing another language, such as C, can make you do the wrong thing, because there might be an assumption that is not true in a higher-level language (such as Python, or PHP). For example, one might assume that finding the length of a list might be O(N) where N is the length of the list. However, this is probably not the case in many high-level language instances. In Python, for most list-like things the cost is O(1).
Knowing more about the specifics of a language will help, but knowing more in general might lead one to make incorrect assumptions.

Just "knowing" C would not make you better.
But, if you understand the whole thing, how native binaries work, how does CPU work with it, what are architecture limitations, you may write a code which is easier for CPU.
For example, how L1/L2 caches affect your work, and how should you write your code to have more hits in L1/L2 caches. When working with C/C++ and doing heavy optimizations, you will have to go down to that kind of things.

It isn't so much knowing C as it is that C is closer to the bare metal than many other languages. You need to be more aware of how to allocate/deallocate memory because you have to do it yourself. Doing it yourself helps you understand the implications of many decisions that you make.
To me any language is acceptable as long as you understand how the compiler/interpreter (basically) maps your code onto the machine. It's a bit easier to do in a language that exposes this directly, but you should be able to, with a bit of reading, figure out how memory is allocated and organized, what sort of indexing patterns are more optimal than others, what constructs are more efficient for particular applications, etc.
More important, I think, is a good understanding of operating systems, memory architectures, and algorithms. If you understand how your algorithm works, why it would be better to choose one algorithm or data structure over another (e.g., HashSet vs. List), and how your code maps onto the machine, it shouldn't matter what language you are using.

This is my experience of how I learnt and taught myself programming, specifically, understanding C, this is going back to early 1990's so may be a bit antique, but the passion and the drive is important:
Learn to understand the low level principles of the computer, such as EGA/VGA programming, here's a link to the Simtel archive on the C programmer's guide to the PC.
Understanding how TSR's work
Download the whole archive of Bob Stout's snippets which is a big collection of C code that does one thing only - study them and understand it, not alone that, the collection of snippets strives to be portable.
Browse at the International Obfuscated C Code Contest (IOCCC) online, and see how the C code can be abused and understand the intracies of the language. The worst code abuse is the winner! Download the archives and study them.
Like myself, I loved the infamous Ponzo's C Tutorial which helped me immensely, unfortunately, the archive is very hard to find. If anyone knows of where to obtain them, please leave a comment and I will amend this answer to include the link. There is another one that I can remember - Coronado's [Generic?] C Tutorial, again, my memory on this one is hazy...
Look at Dr. Dobb's journal and C User Journal here - I do not know if you can still get them in print but they were a classic, can remember the feeling of holding a printed copy in my hand and tearing off home to type in the code to see what happens!
Grab an ancient copy of Turbo C v2 which I believe you can get from borland.com and just play with 16bit C programming to get a feel and mess with the pointers...sure it is ancient and old but playing with pointers on it is fine.
Understand and learn Pointers, link here to the legacy Simtel.net - a crucial link to achieving C Guru'ship for want of a better word, also you will find a host of downloads pertaining to the C programming language - I remember actually ordering the Simtel CD Archive and looking for the C stuff...

A couple of things that you have to deal directly with in C that other languages abstract away from you include explicit memory management (malloc) and dealing directly with pointers.
My girlfriend is one semester from graduating MIT (where they mainly use Java, Scheme, and Python) with a Computer Science degree, and she is currently working at a company whose codebase is in C++. For the first few days she had a difficult time understanding all the pointers/references/etc.
On the other hand, I found moving from C++ to Java very easy, because I was never confused about pass-references-by-value vs pass-by-reference.
Similarly, in C/C++ it is much more apparent that primitives are just the compiler treating the same sets of bits in different ways, as opposed to a language like Python or Ruby where everything is an object with its own distinct properties.

A simple (not entirely realistic) example to illustrate some of the advice above. Consider the seemingly harmless
while(true)
for(Iterator iter = foo.iterator(); iter.hasNext();)
bar.doSomething( iter.next() )
or the even higher level
while(true)
for(Baz b: foo)
bar.doSomething(b)
A possible problem here is that each time round the while loop a new object (the iterator) is created. If all you care about is programmer convenience, then the latter is definitely better. But if the loop has to be efficient or the machine is resource constrained then you are pretty much at the mercy of the designers of your high level language.
For example, a typical complaint for doing high-performance Java is having execution stop while garbage (such as all those allocated Iterator objects) is reclaimed. Not very good if your software is charged with tracking incoming missiles, auto-piloting a passenger jet, or just not leaving the user wondering why the GUI has stopped responding.
One possible solution (still in the higher-level language) would be to weaken the convenience of the iterator to something like
Iterator iter = new Iterator();
while(true)
for(foo.initAlreadyAllocatedIterator(iter); iter.hasNext();)
bar.doSomething(iter.next())
But this would only make sense if you had some idea about memory allocation...otherwise it just looks like a nasty API. Convenience always costs somewhere, and knowing lower-level stuff can help you identify and mitigate those costs.

Why do you program in assembly? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a question for all the hardcore low level hackers out there. I ran across this sentence in a blog. I don't really think the source matters (it's Haack if you really care) because it seems to be a common statement.
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
As far as the assembly goes - is the code written in assembly because you don't want a compiler emitting extra instructions or using excessive bytes, or are you using better algorithms that you can't express in C (or can't express without the compiler mussing them up)?
I completely get that it's important to understand the low-level stuff. I just want to understand the why program in assembly after you do understand it.

I think you're misreading this statement:
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
Games (and most programs these days) aren't "written in assembly" the same way they're "written in C++". That blog isn't saying that a significant fraction of the game is designed in assembly, or that a team of programmers sit around and develop in assembly as their primary language.
What this really means is that developers first write the game and get it working in C++. Then they profile it, figure out what the bottlenecks are, and if it's worthwhile they optimize the heck out of them in assembly. Or, if they're already experienced, they know which parts are going to be bottlenecks, and they've got optimized pieces sitting around from other games they've built.
The point of programming in assembly is the same as it always has been: speed. It would be ridiculous to write a lot of code in assembler, but there are some optimizations the compiler isn't aware of, and for a small enough window of code, a human is going to do better.
For example, for floating point, compilers tend to be pretty conservative and may not be aware of some of the more advanced features of your architecture. If you're willing to accept some error, you can usually do better than the compiler, and it's worth writing that little bit of code in assembly if you find that lots of time is spent on it.
Here are some more relevant examples:
Examples from Games
Article from Intel about optimizing a game engine using SSE intrinsics. The final code uses intrinsics (not inline assembler), so the amount of pure assembly is very small. But they look at the assembler output by the compiler to figure out exactly what to optimize.
Quake's fast inverse square root. Again, the routine doesn't have assembler in it, but you need to know something about architecture to do this kind of optimization. The authors know what operations are fast (multiply, shift) and which are slow (divide, sqrt). So they come up with a very tricky implementation of square root that avoids the slow operations entirely.
High-Performance Computing
Outside the domain of games, people in scientific computing frequently optimize the crap out of things to get them to run fast on the latest hardware. Think of this as games where you can't cheat on the physics.
A great recent example of this is Lattice Quantum Chromodynamics (Lattice QCD). This paper describes how the problem pretty much boils down to one very small computational kernel, which was optimized heavily for PowerPC 440's on an IBM Blue Gene/L. Each 440 has two FPUs, and they support some special ternary operations that are tricky for compilers to exploit. Without these optimizations, Lattice QCD would've run much slower, which is costly when your problem requires millions of CPU hours on expensive machines.
If you are wondering why this is important, check out the article in Science that came out of this work. Using Lattice QCD, these guys calculated the mass of a proton from first principles, and showed last year that 90% of the mass comes from strong force binding energy, and the rest from quarks. That's E=mc2 in action. Here's a summary.
For all of the above, the applications are not designed or written 100% in assembly -- not even close. But when people really need speed, they focus on writing the key parts of their code to fly on specific hardware.

I have not coded in assembly language for many years, but I can give several reasons that I frequently saw:
Not all compilers can make use of certain CPU optimizations and instruction set (e.g., the new instruction sets that Intel adds once in a while). Waiting for compiler writers to catch up means losing a competitive advantage.
Easier to match actual code to known CPU architecture and optimization. For example, things you know about the fetching mechanism, caching, etc. This is supposed to be transparent to the developer, but the fact is that it is not, that's why compiler writers can optimize.
Certain hardware level accesses are only possible/practical via assembly language (e.g., when writing device driver).
Formal reasoning is sometimes actually easier for the assembly language than for the high-level language since you already know what the final or almost final layout of the code is.
Programming certain 3D graphic cards (circa late 1990s) in the absence of APIs was often more practical and efficient in assembly language, and sometimes not possible in other languages. But again, this involved really expert-level games based on the accelerator architecture like manually moving data in and out in certain order.
I doubt many people use assembly language when a higher-level language would do, especially when that language is C. Hand-optimizing large amounts of general-purpose code is impractical.

There is one aspect of assembler programming which others have not mentioned - the feeling of satisfaction you get knowing that every single byte in an application is the result of your own effort, not the compiler's. I wouldn't for a second want to go back to writing whole apps in assembler as I used to do in the early 80s, but I do miss that feeling sometimes...

Usually, a layman's assembly is slower than C (due to C's optimization) but many games (I distinctly remember Doom) had to have specific sections of the game in Assembly so it would run smoothly on normal machines.
Here's the example to which I am referring.

I started professional programming in assembly language in my very first job (80's). For embedded systems the memory demands - RAM and EPROM - were low. You could write tight code that was easy on resources.
By the late 80's I had switched to C. The code was easier to write, debug and maintain. Very small snippets of code were written in assembler - for me it was when I was writing the context switching in an roll-your-own RTOS. (Something you shouldn't do anymore unless it is a "science project".)
You will see assembler snippets in some Linux kernel code. Most recently I've browsed it in spinlocks and other synchronization code. These pieces of code need to gain access to atomic test-and-set operations, manipulating caches, etc.
I think you would be hard pressed to out-optimize modern C compilers for most general programming.
I agree with #altCognito that your time is probably better spent thinking harder about the problem and doing things better. For some reason programmers often focus on micro-efficiencies and neglect the macro-efficiencies. Assembly language to improve performance is a micro-efficiency. Stepping back for a wider view of the system can expose the macro problems in a system. Solving the macro problems can often yield better performance gains.
Once the macro problems are solved then collapse to the micro level.
I guess micro problems are within the control of a single programmer and in a smaller domain. Altering behavior at the macro level requires communication with more people - a thing some programmers avoid. That whole cowboy vs the team thing.

"Yes". But, understand that for the most part the benefits of writing code in assembler are not worth the effort. The return received for writing it in assembly tends to be smaller than the simply focusing on thinking harder about the problem and spending your time thinking of a better way of doing thigns.
John Carmack and Michael Abrash who were largely responsible for writing Quake and all of the high performance code that went into IDs gaming engines go into this in length detail in this book.
I would also agree with Ólafur Waage that today, compilers are pretty smart and often employ many techniques which take advantage of hidden architectural boosts.

These days, for sequential codes at least, a decent compiler almost always beats even a highly seasoned assembly-language programmer. But for vector codes it's another story. Widely deployed compilers don't do such a great job exploiting the vector-parallel capabilities of the x86 SSE unit, for example. I'm a compiler writer, and exploiting SSE tops my list of reasons to go on your own instead of trusting the compiler.

SSE code works better in assembly than compiler intrinsics, at least in MSVC. (i.e. does not create extra copies of data )

I've three or four assembler routines (in about 20 MB source) in my sources at work. All of them are SSE(2), and are related to operations on (fairly large - think 2400x2048 and bigger) images.
For hobby, I work on a compiler, and there you have more assembler. Runtime libraries are quite often full of them, most of them have to do with stuff that defies the normal procedural regime (like helpers for exceptions etc.)
I don't have any assembler for my microcontroller. Most modern microcontrollers have so much peripheral hardware (interrupt controled counters, even entire quadrature encoders and serial building blocks) that using assembler to optimize the loops is often not needed anymore. With current flash prices, the same goes for code memory. Also there are often ranges of pin-compatible devices, so upscaling if you systematically run out of cpu power or flash space is often not a problem
Unless you really ship 100000 devices and programming assembler makes it possible to really make major savings by just fitting in a flash chip a category smaller. But I'm not in that category.
A lot of people think embedded is an excuse for assembler, but their controllers have more CPU power than the machines Unix was developed on. (Microchip coming
with 40 and 60 MIPS microcontrollers for under USD 10).
However a lot people are stuck with legacy, since changing microchip architecture is not easy. Also the HLL code is very architecture dependent (because it uses the hardware periphery, registers to control I/O, etc). So there are sometimes good reasons to keep maintaining a project in assembler (I was lucky to be able to setup affairs on a new architecture from scratch). But often people kid themselves that they really need the assembler.
I still like the answer a professor gave when we asked if we could use GOTO (but you could read that as ASSEMBLER too): "if you think it is worth writing a 3 page essay on why you need the feature, you can use it. Please submit the essay with your results. "
I've used that as a guiding principle for lowlevel features. Don't be too cramped to use it, but make sure you motivate it properly. Even throw up an artificial barrier or two (like the essay) to avoid convoluted reasoning as justification.

Some instructions/flags/control simply aren't there at the C level.
For example, checking for overflow on x86 is the simple overflow flag. This option is not available in C.

Defects tend to run per-line (statement, code point, etc.); while it's true that for most problems, assembly would use far more lines than higher level languages, there are occasionally cases where it's the best (most concise, fewest lines) map to the problem at hand. Most of these cases involve the usual suspects, such as drivers and bit-banging in embedded systems.

If you were around for all the Y2K remediation efforts, you could have made a lot of money if you knew Assembly. There's still plenty of legacy code around that was written in it, and that code occasionally needs maintenance.

Another reason could be when the available compiler just isn't good enough for an architecture and the amount of code needed in the program is not that long or complex as for the programmer to get lost in it. Try programming a microcontroller for an embedded system, usually assembly will be much easier.

Beside other mentioned things, all higher languages have certain limitations. Thats why some people choose to programm in ASM, to have full control over their code.
Others enjoy very small executables, in the range of 20-60KB, for instance check HiEditor, which is implemented by author of the HiEdit control, superb powerfull edit control for Windows with syntax highlighting and tabs in only ~50kb). In my collection I have more then 20 such gold controls from Excell like ssheets to html renders.

I think a lot of game developers would be surprised at this bit of information.
Most games I know of use as little assembly as at all possible. In some cases none at all, and at worst, one or two loops or functions.
That quote is over-generalized, and nowhere near as true as it was a decade ago.
But hey, mere facts shouldn't hinder a true hacker's crusade in favor of assembly. ;)

If you are programming a low end 8 bit microcontroller with 128 bytes of RAM and 4K of program memory you don't have much choice about using assembly. Sometimes though when using a more powerful microcontroller you need a certain action to take place at an exact time. Assembly language comes in useful then as you can count the instructions and so measure the clock cycles used by your code.

Games are pretty performance hungry and although in the meantime the optimizers are pretty good a "master programmer" is still able to squeeze out some more performance by hand coding the right parts in assembly.
Never ever start optimizing your program without profiling it first. After profiling should be able to identify bottlenecks and if finding better algorithms and the like don't cut it anymore you can try to hand code some stuff in assembly.

Aside from very small projects on very small CPUs, I would not set out to ever program an entire project in assembly. However, it is common to find that a performance bottleneck can be relieved with the strategic hand coding of some inner loops.
In some cases, all that is really required is to replace some language construct with an instruction that the optimizer cannot be expected to figure out how to use. A typical example is in DSP applications where vector operations and multiply-accumulate operations are difficult for an optimizer to discover, but easy to hand code.
For example certain models of the SH4 contain 4x4 matrix and 4 vector instructions. I saw a huge performance improvement in a color correction algorithm by replacing equivalent C operations on a 3x3 matrix with the appropriate instructions, at the tiny cost of enlarging the correction matrix to 4x4 to match the hardware assumption. That was achieved by writing no more than a dozen lines of assembly, and carrying matching adjustments to the related data types and storage into a handful of places in the surrounding C code.

It doesn't seem to be mentioned, so I thought I'd add it: in modern games development, I think at least some of the assembly being written isn't for the CPU at all. It's for the GPU, in the form of shader programs.
This might be needed for all sorts of reasons, sometimes simply because whatever higher-level shading language used doesn't allow the exact operation to be expressed in the exact number of instructions wanted, to fit some size-constraint, speed, or any combination. Just as usual with assembly-language programming, I guess.

Almost every medium-to-large game engine or library I've seen to date has some hand-optimized assembly versions available for matrix operations like 4x4 matrix concatenation. It seems that compilers inevitably miss some of the clever optimizations (reusing registers, unrolling loops in a maximally efficient way, taking advantage of machine-specific instructions, etc) when working with large matrices. These matrix manipulation functions are almost always "hotspots" on the profile, too.
I've also seen hand-coded assembly used a lot for custom dispatch -- things like FastDelegate, but compiler and machine specific.
Finally, if you have Interrupt Service Routines, asm can make all the difference in the world -- there are certain operations you just don't want occurring under interrupt, and you want your interrupt handlers to "get in and get out fast"... you know almost exactly what's going to happen in your ISR if it's in asm, and it encourages you to keep the bloody things short (which is good practice anyway).

I have only personally talked to one developer about his use of assembly.
He was working on the firmware that dealt with the controls for a portable mp3 player.
Doing the work in assembly had 2 purposes:
Speed: delays needed to be minimal.
Cost: by being minimal with the code, the hardware needed to run it could be slightly less powerful. When mass-producing millions of units, this can add up.

The only assembler coding I continue to do is for embedded hardware with scant resources. As leander mentions, assembly is still well suited to ISRs where the code needs to be fast and well understood.
A secondary reason for me is to keep my knowledge of assembly functional. Being able to examine and understand the steps which the CPU is taking to do my bidding just feels good.

Last time I wrote in assembler was when I could not convince the compiler to generate libc-free, position independent code.
Next time will probably be for the same reason.
Of course, I used to have other reasons.

A lot of people love to denigrate assembly language because they've never learned to code with it and have only vaguely encountered it and it has left them either aghast or somewhat intimidated. True talented programmers will understand that it is senseless to bash C or Assembly because they are complimentary. in fact the advantage of one is the disadvantage of the other. The organized syntaxic rules of C improves clarity but at the same gives up all the power assembly has from being free of any structural rules ! C code instruction are made to create non-blocking code which could be argued forces clarity of programming intent but this is a power loss. In C the compiler will not allow a jump inside an if/elseif/else/end. Or you are not allowed to write two for/end loops on diferent variables that overlap each other, you cannot write self modifying code (or cannot in an seamless easy way), etc.. conventional programmers are spooked by the above, and would have no idea how to even use the power of these approaches as they have been raised to follow conventional rules.
Here is the truth : Today we have machine with the computing power to do much more that the application we use them for but the human brain is too incapable to code them in a rule free coding environment (= assembly) and needs restrictive rules that greatly reduce the spectrum and simplifies coding.
I have myself written code that cannot be written in C code without becoming hugely inefficient because of the above mentionned limitations. And i have not yet talked about speed which most people think is the main reason for writting in assembly, well it is if you mind is limited to thinking in C then you are the slave of you compiler forever. I always thought chess players masters would be ideal assembly programmers while the C programmers just play "Dames".

No longer speed, but Control. Speed will sometimes come from control, but it is the only reason to code in assembly. Every other reason boils down to control (i.e. SSE and other hand optimization, device drivers and device dependent code, etc.).

If I am able to outperform GCC and Visual C++ 2008 (known also as Visual C++ 9.0) then people will be interested in interviewing me about how it is possible.
This is why for the moment I just read things in assembly and just write __asm int 3 when required.
I hope this help...

I've not written in assembly for a few years, but the two reasons I used to were:
The challenge of the thing! I went through a several-month period years
ago when I'd write everything in x86 assembly (the days of DOS and Windows
3.1). It basically taught me a chunk of low level operations, hardware I/O, etc.
For some things it kept size small (again DOS and Windows 3.1 when writing TSRs)
I keep looking at coding assembly again, and it's nothing more than the challenge and joy of the thing. I have no other reason to do so :-)

I once took over a DSP project which the previous programmer had written mostly in assembly code, except for the tone-detection logic which had been written in C, using floating-point (on a fixed-point DSP!). The tone detection logic ran at about 1/20 of real time.
I ended up rewriting almost everything from scratch. Almost everything was in C except for some small interrupt handlers and a few dozen lines of code related to interrupt handling and low-level frequency detection, which runs more than 100x as fast as the old code.
An important thing to bear in mind, I think, is that in many cases, there will be much greater opportunities for speed enhancement with small routines than large ones, especially if hand-written assembler can fit everything in registers but a compiler wouldn't quite manage. If a loop is large enough that it can't keep everything in registers anyway, there's far less opportunity for improvement.

The Dalvik VM that interprets the bytecode for Java applications on Android phones uses assembler for the dispatcher. This movie (about 31 minutes in, but its worth watching the whole movie!) explains how
"there are still cases where a human can do better than a compiler".

I don't, but I've made it a point to at least try, and try hard at some point in the furture (soon hopefully). It can't be a bad thing to get to know more of the low level stuff and how things work behind the scenes when I'm programming in a high level language. Unfortunately time is hard to come by with a full time job as a developer/consultant and a parent. But I will give at go in due time, that's for sure.

Key concepts to learn in Assembly

I am a firm believer in the idea that one of the most important things you get from learning a new language is not how to use a new language, but the knowledge of concepts that you get from it. I am not asking how important or useful you think Assembly is, nor do I care if I never use it in any of my real projects.
What I want to know is what concepts of Assembly do you think are most important for any general programmer to know? It doesn't have to be directly related to Assembly - it can also be something that you feel the typical programmer who spends all their time in higher-level languages would not understand or takes for granted, such as the CPU cache.

Register allocation and management
Assembly gives you a very good idea of how many variables (machine-word-sized integers) the CPU can juggle simultaneously. If you can break your loops down so that they involve only a few temporary variables, they'll all fit in registers. If not, your loop will run slowly as things get swapped out to memory.
This has really helped me with my C coding. I try to make all loops tight and simple, with as little spaghetti as possible.
x86 is dumb
Learning several assembly languages has made me realize how lame the x86 instruction set is. Variable-length instructions? Hard-to-predict timing? Non-orthogonal addressing modes? Ugh.
The world would be better if we all ran MIPS, I think, or even ARM or PowerPC :-) Or rather, if Intel/AMD took their semiconductor expertise and used it to make multi-core, ultra-fast, ultra-cheap MIPS processors instead of x86 processors with all of those redeeming qualities.

I think assembly language can teach you lots of little things, as well as a few big concepts.
I'll list a few things I can think of here, but there is no substitute for going and learning and using both x86 and a RISC instruction set.
You probably think that integer operations are fastest. If you want to find an integer square root of an integer (i.e. floor(sqrt(i))) it's best to use an integer-only approximation routine, right?
Nah. The math coprocessor (on x86 that is) has a fsqrt instruction. Converting to float, taking the square root, and converting to int again is faster than an all-integers algorithm.
Then there are things like accessing memory that you can follow, but not properly apprecatiate, until you've delved into assembly. Say you had a linked list, and the first element in the list contains a variable that you will need to access frequently. The list is reordered rarely. Well, each time you need to access that variable, you need to load the pointer to the first element in the list, then using that, load the variable (assuming you can't keep the address of the variable in a register between uses). If you instead stored the variable outside of the list, you only need a single load operation.
Of course saving a couple of cycles here and there is usually not important these days. But if you plan on writing code that needs to be fast, this kind of knowledge can be applied both with inline assembly and generally in other languages.
How about calling conventions? (Some assemblers take care of this for you - Real Programmers don't use those.) Does the caller or callee clean up the stack? Do you even use the stack? You can pass values in registers - but due to the funny x86 instruction set, it's better to pass certain things in certain registers. And which registers will be preserved? One thing C compilers can't really optimise by themselves is calls.
There are little tricks like PUSHing a return address and then JMPing into a procedure; when the procedure returns it will go to the PUSHed address. This departure from the usual way of thinking about function calls is another one of those "states of enlightenment". If you were ever to design a programming language with innovative features, you ought to know about funny things that the hardware is capable of.
A knowledge of assembly language teaches you architecture-specific things about computer security. How you might exploit buffer overflows, or break into kernel mode, and how to prevent such attacks.
Then there's the ubercoolness of self-modifying code, and as a related issue, mechanisms for things such as relocations and applying patches to code (this needs investigation of machine code as well).
But all these things need the right sort of mind. If you're the sort of person who can put
while(x--)
{
...
}
to good use once you learn what it does, but would find it difficult to work out what it does by yourself, then assembly language is probably a waste of your time.

It's good to know assembly language in order to gain a better appreciation for how the computer works "under the hood," and it helps when you are debugging something and all the debugger can give you is an assembly code listing, which at least gives you fighting chance of figuring out what the problem might be. However, trying to apply low-level knowledge to high-level programming languages, such as trying to take advantage of how the CPU caches instructions and then writing wonky high-level code to force the compiler to produce super-efficient machine code, is probably a sign that you are trying to micro-optimize. In most cases, it's usually better not to try to outsmart the compiler, unless you need the performance gain, in which case, you might as well write those bits in assembly anyway.
So, it's good to know assembly for the sake of better understanding of how things work, but the knowledge gained is not necessarily directly applicable to how you write code in high-level languages. On that note, however, I found that learning how function calls work at the assembly-code level (learning about the stack and related registers, learning about how parameters are passed on the stack, learning how automatic storage works, etc.) made it a lot easier to understand problems I had in higher-level code, such as "out of stack space" errors and "invalid calling convention" errors.

The most important concept is SIMD, and creative use of it. Proper use of SIMD can give enormous performance benefits in a massive variety of applications ranging from everything from string processing to video manipulation to matrix math. This is where you can get over 10x performance boosts over pure C code--this is why assembly is still useful beyond mere debugging.
Some examples from the project I work on (all numbers are clock cycle counts on a Core 2):
Inverse 8x8 H.264 DCT (frequency transform):
c: 1332
mmx: 187
sse2: 127
8x8 Chroma motion compensation (bilinear interpolation filter):
c: 639
mmx: 144
sse2: 110
ssse3: 79
4 16x16 Sum of Absolute Difference operations (motion search):
c: 3948
mmx: 278
sse2: 231
ssse3: 215
(yes, that's right--over 18x faster than C!)
Mean squared error of a 16x16 block:
c: 1013
mmx: 193
sse2: 131
Variance of a 16x16 block:
c: 783
mmx: 171
sse2: 106

Memory, registers, jumps, loops, shifts and the various operations one can perform in assembler. I don't miss the days of debugging my assembly language class programs - they were painful! - but it certainly gave me a good foundation.
We forget (or never knew, perhaps) that all this fancy-pants stuff that we use today (and that I love!) boils down to all this stuff in the end.
Now, we can certainly have a productive and lucrative career without knowing assembler, but I think these concepts are good to know.

I would say that learning recursion and loops in assembly has taught me alot. It made me understand the underlying concept of how the compiler/interpreter of the language i'm using pushes things onto a stack, and pops them off as it needs them. I also learned how to exploit the infamous stack overflow. (which is still surprisingly easy in C with some get- and put- commands).
Other than using asm in every-day situations, i don't think that i would use any of the concepts assembly taught me.

I would say that addressing modes are extremely important.
My alma mater took that to an extreme, and because x86 didn't have enough of them, we studied everything on a simulator of PDP11 that must have had at least 7 of them that I remember. In retrospect, that was a good choice.

timing
fast execution:
parallel processing
simple instructions
lookup tables
branch prediction, pipelining
fast to slow access to storage:
registers
cache, and various levels of cache
memory heap and stack
virtual memory
external I/O

Nowadays, x86 asm is not a direct line to the guts of the CPU, but more of an API. The assembler opcodes you write are themselves are compiled into a completely different instruction-set, rearranged, rewritten, fixed-up and generally mangled beyond recognition.
So it's not like learning assembler gives you a fundamental insight into what's going on inside the CPU. IMHO, more important than learning assembler is to get a good understanding of how the target CPU and the memory hierarchy works.
This series of articles covers the latter topic pretty thoroughly.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight