The limits of parallelism (job-interview question)

The limits of parallelism (job-interview question) - theory

Is it possible to solve a problem of O(n!) complexity within a reasonable time given infinite number of processing units and infinite space?
The typical example of O(n!) problem is brute-force search: trying all permutations (ordered combinations).

It sure is. Consider the Traveling Salesman Problem in it's strict NP form: given this list of costs for traveling from each point to each other point, can you put together a tour with cost less than K? With the new infinite-core CPU from Intel, you just assign one core to each possible permutation, and add up the costs (this is fast), and see if any core flags a success.
More generally, a problem in NP is a decision problem such that a potential solution can be verified in polynomial time (i.e., efficiently), and so (since the potential solutions are enumerable) any such problem can be efficiently solved with sufficiently many CPUs.

It sounds like what you're really asking is whether a problem of O(n!) complexity can be reduced to O(n^a) on a non-deterministic machine; in other words, whether Not-P = NP. The answer to that question is no, there are some Not-P problems that are not NP. For example, a limited halting problem (that asks if a program halts in at most n! steps).

The problem would be distributing the work and collecting the results.
If all the CPUs can read the same piece of memory at once, and if each one has a unique CPU-ID that is known to it, then the ID may be used to select a permutation, and the distribution problem is solveable in constant time.
Gathering the results would be tricky, though. Each CPU could compare with its (numerical) neighbor, and then that result compared to the result of the two closest neighbors, etc. This will be a O(log(n!)) process. I don't know for sure, but I suspect that O(log(n!)) is hyperpolynomial, so I don't think that's a solution.

No, N! is even higher than NP. Thinking unlimited parallelism could solve NP problem in polynomial time, which is usually considered as a "reasonable" time complexity, N! problem is still higher than polynomial on such a setup.

You mentioned search as a "typical" problem, but were you actually asked specifically about a search problem? If so, then yes, search is typically parallelizable, but as far as I can tell O(n!) in principle does not imply the degree of concurrency available, does it? You could have a completely serial O(n!) problem, which means infinite computers won't help. I once had an unusual O(n^4) problem that actually was completely serial.
So, available concurrency is the first thing, and IMHO you should get points for bringing up Amdahl's law in an interview. Next potential pitfall is inter-processor communication, and in general the nature of the algorithm. Consider, for example, this list of application classes: http://view.eecs.berkeley.edu/wiki/Dwarf_Mine. FWIW the O(n^4) code I mentioned earlier sort of falls into the FSM category.
Another somewhat related anecdote: I've heard an engineer from a supercomputer vendor claim that if 10% of their CPU time were being spent in MPI libraries, they consider the parallelization a solid success (though that may have just been limited to codes in the computational chemistry domain).

If the problem is one of checking permutations/answers to a problem of complexity O(n!), then of course you can do it efficiently with an infinite number of processors.
The reason is that you can easily distribute atomic pieces of the problem (an atomic piece of the problem might, say, be one of the permutations to check) with logarithmic efficiency.
As a simple example, you could set up the processors as a 'binary tree', so to speak. You could be at the root, and have the processors deliver permutations of the problem (or whatever the smallest pieces of the problem might be) to the leaf processors to solve, and you'd end up solving the problem in log(n!) time.
Remember it's the delivery of the permutations to the processors that takes a long time. Each part of the problem itself will actually be solved instantly.
Edit: Fixed my post according to the comments below.

Sometimes the correct answer is, "How many times does this come up with your code base?" but in this case, there is a real answer.
The correct answer is no, because not all problems can be solved using perfect parallel processing. For example, a travelling salesman-like problem must commit to one path for the second leg of the journey to be considered.
Assuming a fully connected matrix of cities, should you want to display all possible non-cyclic routes for our weary salesman, you're stuck with a O(n!) problem, which can be decomposed to an O(n)*O((n-1)!) problem. The issue is that you need to commit to one path (on the O(n) side of the equation) before you can consider the remaining paths (on the O((n-1)!) side of the equation).
Since some of the computations must be performed prior to other computations, then there is no way to scatter the results perfectly in a single scatter / gather pass. That means the solution will be waiting on the results of calculations which must come before the "next" step can be started. This is the key, as the need for prior partial solutions provide a "bottle neck" in the ability to proceed with the computation.
Since we've proven we can make a number of these infinitely fast, infinitely numerous, CPUs wait (even if they are waiting on themselves), we know that the runtime cannot be O(1), and we only need to pick a very large N to guarantee an "unacceptable" run time.

This is like asking if an infinite number of monkeys typing on a monkey-destruction proof computer with a word-processor can come up with all the works of Shakespeare; given an infinite amount of time. The realist would say not since the conditions are no physically possible. The idealist will say yes; in theory it can happen. Since Software Engineering (Software Engineering, not Computer Science) focuses on real system we can see and touch, then the answer is no. If you doubt me, then go build it and prove me wrong! IMHO.

Disregarding the cost of setup (whatever that might be...assigning a range of values to a processing unit, for instance), then yes. In such a case, any value less than infinity could be solved in one concurrent iteration across an equal number of processing units.
Setup, however, is something significant to disregard.

Each problem could be solved by one CPU, but who would deliver these jobs to all infinite CPU's? In general, this task is centralized, so if we have infinite jobs to deliver to all infinite CPU's, we could take infinite time to do so.

Related

What is good measure to compare algorithms?

Well I was reading an article about comparing two algorithms by firstly analyzing them.
My teacher taught me that you can analyze algorithm by directly using number of steps for that algorithm.
for ex:
algo printArray(arr[n]){
for(int i=0;i<n;i++){
write arr[i];
}
}
will have complexity of O(N), where N is size of array. and it repeats the for loop for N times.
while
algo printMatrix(arr,m,n){
for(i=0;i<m;i++){
for(j=0;j<n;j++){
write arr[i][j];
}
}
}
will have complexity of O(MXN) ~ O(N^2) when M=N. statements inside for are executed MXN times.
similarly O(log N). if it divides input into 2 equal parts. and So on.
But according to that article:
The Measures Execution Time, Number of statements aren't good for analyzing the algorithm.
because:
Execution Time will be system Dependent and,
Number of statements will vary with the programming language used.
and It states that
Ideal Solution will be to express running time of algorithm as a function of input size N that is f(n).
That confused me a little, How can you calculate running time if you consider execution time as not good measure?
Can experts here please elaborate this?
Thanks in advance.

When you were saying "complexity of O(N)" that is referred to as "Big-O notation" which is the same as the "Ideal Solution" that you mentioned in your post. It is a way of expressing run time as a function of input size.
I think were you got confused was when it said "express running time" - it didn't mean express it in a numerical value (which is what execution time is), it meant express it in Big-O notation. I think you just got tripped up on the terminology.

Execution time is indeed system-dependent, but it also depends on the number of instructions the algorithm executes.
Also, I do not understand how the number of steps is irrelevant, given that algorithms are analyzed as language-agnostic and without paying any attention to whatever features and syntactic-sugars various languages imply.
The one measure of algorithm analysis I have always encountered since I started analyzing algorithms is the number of executed instructions and I fail to see how this metric may be irrelevant.
At the same time, complexity classes are meant as an "order of magnitude" indication of how fast or slow an algorithm is. They are dependent of the number of executed instructions and independent of the system the algorithm runs on, because by definition an elementary operation (such as addition of two numbers) should take constant time, however large or small this "constant" means in practice, therefore complexity classes do not change. The constants inside the expression for the exact complexity function may indeed vary from system to system, but what is actually relevant for algorithm comparison is the complexity class, as only by comparing those can you find out how an algorithm behaves on increasingly large inputs (asymptotically) compared to another algorithm.

Big-O notation waves away constants (both fixed cost and constant multipliers). So any function that takes kn+c operations to complete is (by definition!) O(n), regardless of k and c. This is why it's often better to take real-world measurements (profiling) of your algorithms in action with real data, to see how fast they effectively are.
But execution time, obviously, varies depending on the data set -- if you're trying to come up with a general measure of performance that's not based on a specific usage scenario, then execution time is less valuable (unless you're comparing all algorithms under the same conditions, and even then it's not necessarily fair unless you model the majority of possible scenarios, and not just one).
Big-O notation becomes more valuable as you move to larger data sets. It gives you a rough idea of the performance of an algorithm, assuming reasonable values for k and c. If you have a million numbers you want to sort, then it's safe to say you want to stay away from any O(n^2) algorithm, and try to find a better O(n lg n) algorithm. If you're sorting three numbers, the theoretical complexity bound doesn't matter anymore, because the constants dominate the resources taken.
Note also that while the number of statements a given algorithm can be expressed in varies wildly between programming languages, the number of constant-time steps that need to be executed (at the machine level for your target architecture, which is typically one where integer arithmetic and memory accesses take a fixed amount of time, or more precisely are bounded by a fixed amount of time). It is this bound on the maximum number of fixed-cost steps required by an algorithm that big-O measures, which has no direct relation to actual running time for a given input, yet still describes roughly how much work must be done for a given data set as the size of the set grows.

In comparing algorithms, execution speed is important as well mentioned by others, but other factors like memory space are crucial too.
Memory space also uses order of complexity notation.
Code could sort an array in place using a bubble sort needing only a handful of extra memory O(1). Other methods, though faster, may need O(ln N) memory.
Other more esoteric measures include code complexity like Cyclomatic complexity and Readability

Traditionally, computer science measures algorithm effectivity (speed) by the number of comparisons or sometimes data accesses, using "Big O notation". This is so, because the number of comparisons (and/or data accesses) is a good mathematical model to describe efficiency of certain algorithms, searching and sorting ones in particular, where O(log n) is considered the fastest possible in theory.
This theoretic model has always had several flaws though. It assumes that comparisons (and/or data accessing) are what takes time, and that the time for performing things like function calls and branching/looping is neglectible. This is of course nonsense in the real world.
In the real world, a recursive binary search algorithm might for example be extremely slow compared to a quick & dirty linear search implemented with a plain for loop, because on the given system, the function call overhead is what takes the most time, not the comparisons.
There are a whole lot of things that affect performance. As CPUs evolve, more such things are invented. Nowadays, you might have to consider things like data alignment, instruction pipe-lining, branch prediction, data cache memory, multiple CPU cores and so on. All these technologies make traditional algorithm theory rather irrelevant.
To write the most effective code possible, you need to have a specific system in mind and you need in-depth knowledge about said system. Fortunately, compilers have evolved a lot too, so a lot of the in-depth system knowledge can be left to the person who implements a compiler port for the specific system.
Generally, I think many programmers today spend far too much time pondering about program speed and coming up with "clever things" to get better performance. Back in the days when CPUs were slow and compilers were terrible, such things were very important. But today, a good, modern programmer focus on making the code bug-free, readable, maintainable, re-useable, secure, portable etc. It doesn't matter how fast your program is, if it is a buggy mess of unreadable crap. So deal with performance when the need arises.

Nussinov Parallel

I am a biologist and I am trying to study computer languages. But, when I was trying to learn about the lpthread library, it seems odd as the result was lower than the sequential version.
In fact I am still reading the Tanenbaum book. But my main focus is to learn the basics of the calculations of the secondary structure of RNAs. So I found the explanation to the nussinov algorithm in a book and did indeed implement it. But when I tried to make a parallel version I believe that I might be missing the whole point, as this is my first contact with parallel implementations.
My questions are:
1. How should I implement a data-parallelism version for this algorithm ?
2. Why is my implementation slightly slower than the sequential one?
The code is available on: https://gist.github.com/drenge/6395472 (each file is a different version parallel/sequential)

there are two ways to make a parallel version of an algorithm/program.
You study the algorithm and write the serial program. Afterwards, you start profiling the program to see where you can obtain speed gains. Those are the places where parallelism might come in handy (might, not will). I call this method the "desparate man's tool". This method is useful (!), but most of the times, the method beneath can provide better performance gains. This way of doing the optimisation method only takes programming and user experience into account.
You take the algorithm and try to figure out an other algorithm that permits parallel handling of the problem. Are there independent calculations or steps in the algorithm, are there parts of the algorithm that can be done before other parts completely finish, ... This could be called "the theoretical approach". Keep in mind that every thread has its overhead, and you don't want the overhead to be bigger than the gain you wish to obtain.
In fact, a combination of both is the best way to go (if parallelism is really necessary): first concentrate on method 2 (optimise the algorithm so that is stays scientifically correct, but can be treated in multi threading). Then look at the critical thread (can be found while profiling) and start optimising that thread.
As Kerrek SB already told: parallel programming is a very complex topic, with lots of possible pitfalls. And at the end of the road, you should ask yourself: is it worth the effort. After all: loosing weeks of study and programming time to gain some minutes is not worth your while.
On the other hand, if your program will run thousands of times, frustrating users due to long waiting times or a lack of responsiveness, than maybe, it could be useful to make a more performant version after all. But again: can't you reach the same goal by optimising a sequential version without the parallel clutter? Lot's of algorithms are of order O(exp(x)) or worse and can be reduced to O(x) or even O(log(x)).
Kind regards,
PB

Randomness in Artificial Intelligence & Machine Learning

This question came to my mind while working on 2 projects in AI and ML. What If I'm building a model (e.g. Classification Neural Network,K-NN, .. etc) and this model uses some function that includes randomness. If I don't fix the seed, then I'm going to get different accuracy results every time I run the algorithm on the same training data. However, If I fix it then some other setting might give better results.
Is averaging a set of accuracies enough to say that the accuracy of this model is xx % ?
I'm not sure If this is the right place to ask such a question/open such a discussion.

Simple answer, yes, you randomize it and use statistics to show the accuracy. However, it's not sufficient to just average a handful of runs. You need, at a minimum, some notion of the variability as well. It's important to know whether "70%" accurate means "70% accurate for each of 100 runs" or "100% accurate once and 40% accurate once".
If you're just trying to play around a bit and convince yourself that some algorithm works, then you can just run it 30 or so times and look at the mean and standard deviation and call it a day. If you're going to convince anyone else that it works, you need to look into how to do more formal hypothesis testing.

There are models which are naturally dependent on randomness (e.g., random forests) and models which only use randomness as part of exploring the space (e.g., initialisation of values for neural networks), but actually have a well-defined, deterministic, objective function.
For the first case, you will want to use multiple seeds and report average accuracy, std. deviation, and the minimum you obtained. It is often good if you have a way to reproduce this, so just use multiple fixed seeds.
For the second case, you can always tell, just on the training data, which run is best (although it might actually not be the one which gives you the best test accuracy!). Thus, if you have the time, it is good to do say, 10 runs, and then evaluate on the one with the best training error (or validation error, just never evaluate on testing for this decision). You can go a level up and do multiple multiple runs and get a standard deviation too. However, if you find that this is significant, it probably means you weren't trying enough initialisations or that you are not using the right model for your data.

Stochastic techniques are typically used to search very large solution spaces where exhaustive search is not feasible. So it's almost inevitable that you will be trying to iterate over a large number of sample points with as even a distribution as possible. As mentioned elsewhere, basic statistical techniques will help you determine when your sample is big enough to be representative of the space as a whole.
To test accuracy, it is a good idea to set aside a portion of your input patterns and avoid training against those patterns (assuming you are learning from a data set). Then you can use the set to test whether your algorithm is learning the underlying pattern correctly, or whether it's simply memorizing the examples.
Another thing to think about is the randomness of your random number generator. Standard random number generators (such as rand from <stdlib.h>) may not make the grade in many cases so look around for a more robust algorithm.

I generalize the answer from what i get of your question,
I suppose Accuracy is always average accuracy of multiple runs and the standard deviation. So if you are considering accuracy you get using different seeds to the random generator, are you not actually considering a greater range of input (which should be a good thing). But you have to consider the Standard deviation to consider the accuracy. Or did i get your question it totally wrong ?

I believe cross-validation may give you what you ask about: an averaged, and therefore more reliable, estimate of classification performance. It contains no randomness, except in permuting the data set initially. The variation comes from choosing different train/test splits.

general question about solving problems with parallelisation

i have a general question about programming of parallel algorithms in C. Lets assume that our task is to implement some matrix algorithms with MPI and/or OpenMP. There are some situations, like false sharing in OpenMP or in MPI where the communications arise in dependence of the matrix dimension (columns cyclic distrubuted among processes), which cause some problems . Would it be a good and a common attempt to solve this situations by, for example, transposing the matrix, because this would reduce the necessary communications or even avoiding the false sharing problem? After that you would undo the transposition. Of course, assuming that this would lead to a much better speed up.
I dont think that this would be very cunning and more of a lazy way to do this. But im curious to read some opions about this.

Let's start with the first question first: can it make sense to transpose? The answer is, it depends, and you can estimate whether it will improve things or not.
The transposition/retransposition with impose a one-time memory bandwidth cost of 2*(going through memory the fast way) + 2*(going through memory the slow way) where those memory operations are literally memory operations in the multicore case, or network communications in the distributed memory case. You're going to be reading a matrix in the fast way and putting it into memory the slow way. (You can make this, essentially, 4*(going through memory the fast way) by reading the matrix in one cache-sized block at a time, transposing in cache, and writing out in order).
Whether or not that's a win or not depends on how many times you'll be accessing the array. If you would have been hitting the entire non-transposed array 4 times with memory access in the "wrong" direction, then you will clearly win by doing the two transposes. If you'd only be going through the non-transposed array once in the wrong direction, then you almost certainly won't win by doing the transposition.
As to the larger question, #AlexandreC is absolutely right here -- trying to implement your own linear algebra routines is madness. Take a look at, eg, How To Write Fast Numerical Code, figure 3; there can be factors of 40 in performance between naive and highly-tuned (say) GEMM operations. These things are highly memory-bandwidth limited, and in parallel that means network limited. By far, best is to use existing tools.
For multicore linear algebra, existing libraries include
Atlas
Plasma
Flame
For MPI implementations, there are
BLACS
Scalapack
or complete solver environments like
PETSc
Trilinos

I don't know that you'd throw the transpose away the second that you completed the operation, but yes this is a valid mechanism to increase parallelism.
I am not an expert; I've only read a little bit about this topic, and even that was for SIMD architectures, so please take my opinion lightly... but I think the usual mechanism is to lay your structures out in memory to match the machine (so you'd transpose a large matrix to line up better with your vectors and increase the dependency distance in your loops), and then you also build an indexing structure of pointers around that so that you can quickly access individual elements in the transpose differently. This gets more difficult to do as your input changes more dynamically.

I dont think that this would be very cunning and more of a lazy way to do this.
Lazy solutions are usually better than "cunning" ones, because they tend to be more simple and straightforward. They're therefore easier to implement, document, understand and maintain. Indeed, laziness is arguably one of the greatest virtues a programmer can have. As long as the program produces correct results at acceptable speeds, nobody should care how elegantly you solved the problem (including you).

Did you apply computational complexity theory in real life?

I'm taking a course in computational complexity and have so far had an impression that it won't be of much help to a developer.
I might be wrong but if you have gone down this path before, could you please provide an example of how the complexity theory helped you in your work? Tons of thanks.

O(1): Plain code without loops. Just flows through. Lookups in a lookup table are O(1), too.
O(log(n)): efficiently optimized algorithms. Example: binary tree algorithms and binary search. Usually doesn't hurt. You're lucky if you have such an algorithm at hand.
O(n): a single loop over data. Hurts for very large n.
O(n*log(n)): an algorithm that does some sort of divide and conquer strategy. Hurts for large n. Typical example: merge sort
O(n*n): a nested loop of some sort. Hurts even with small n. Common with naive matrix calculations. You want to avoid this sort of algorithm if you can.
O(n^x for x>2): a wicked construction with multiple nested loops. Hurts for very small n.
O(x^n, n! and worse): freaky (and often recursive) algorithms you don't want to have in production code except in very controlled cases, for very small n and if there really is no better alternative. Computation time may explode with n=n+1.
Moving your algorithm down from a higher complexity class can make your algorithm fly. Think of Fourier transformation which has an O(n*n) algorithm that was unusable with 1960s hardware except in rare cases. Then Cooley and Tukey made some clever complexity reductions by re-using already calculated values. That led to the widespread introduction of FFT into signal processing. And in the end it's also why Steve Jobs made a fortune with the iPod.
Simple example: Naive C programmers write this sort of loop:
for (int cnt=0; cnt < strlen(s) ; cnt++) {
/* some code */
}
That's an O(n*n) algorithm because of the implementation of strlen(). Nesting loops leads to multiplication of complexities inside the big-O. O(n) inside O(n) gives O(n*n). O(n^3) inside O(n) gives O(n^4). In the example, precalculating the string length will immediately turn the loop into O(n). Joel has also written about this.
Yet the complexity class is not everything. You have to keep an eye on the size of n. Reworking an O(n*log(n)) algorithm to O(n) won't help if the number of (now linear) instructions grows massively due to the reworking. And if n is small anyway, optimizing won't give much bang, too.

While it is true that one can get really far in software development without the slightest understanding of algorithmic complexity. I find I use my knowledge of complexity all the time; though, at this point it is often without realizing it. The two things that learning about complexity gives you as a software developer are a way to compare non-similar algorithms that do the same thing (sorting algorithms are the classic example, but most people don't actually write their own sorts). The more useful thing that it gives you is a way to quickly describe an algorithm.
For example, consider SQL. SQL is used every day by a very large number of programmers. If you were to see the following query, your understanding of the query is very different if you've studied complexity.
SELECT User.*, COUNT(Order.*) OrderCount FROM User Join Order ON User.UserId = Order.UserId
If you have studied complexity, then you would understand if someone said it was O(n^2) for a certain DBMS. Without complexity theory, the person would have to explain about table scans and such. If we add an index to the Order table
CREATE INDEX ORDER_USERID ON Order(UserId)
Then the above query might be O(n log n), which would make a huge difference for a large DB, but for a small one, it is nothing at all.
One might argue that complexity theory is not needed to understand how databases work, and they would be correct, but complexity theory gives a language for thinking about and talking about algorithms working on data.

For most types of programming work the theory part and proofs may not be useful in themselves but what they're doing is try to give you the intuition of being able to immediately say "this algorithm is O(n^2) so we can't run it on these one million data points". Even in the most elementary processing of large amounts of data you'll be running into this.
Thinking quickly complexity theory has been important to me in business data processing, GIS, graphics programming and understanding algorithms in general. It's one of the most useful lessons you can get from CS studies compared to what you'd generally self-study otherwise.

Computers are not smart, they will do whatever you instruct them to do. Compilers can optimize code a bit for you, but they can't optimize algorithms. Human brain works differently and that is why you need to understand the Big O. Consider calculating Fibonacci numbers. We all know F(n) = F(n-1) + F(n-2), and starting with 1,1 you can easily calculate following numbers without much effort, in linear time. But if you tell computer to calculate it with that formula (recursively), it wouldn't be linear (at least, in imperative languages). Somehow, our brain optimized algorithm, but compiler can't do this. So, you have to work on the algorithm to make it better.
And then, you need training, to spot brain optimizations which look so obvious, to see when code might be ineffective, to know patterns for bad and good algorithms (in terms of computational complexity) and so on. Basically, those courses serve several things:
understand executional patterns and data structures and what effect they have on the time your program needs to finish;
train your mind to spot potential problems in algorithm, when it could be inefficient on large data sets. Or understand the results of profiling;
learn well-known ways to improve algorithms by reducing their computational complexity;
prepare yourself to pass an interview in the cool company :)

It's extremely important. If you don't understand how to estimate and figure out how long your algorithms will take to run, then you will end up writing some pretty slow code. I think about compuational complexity all the time when writing algorithms. It's something that should always be on your mind when programming.
This is especially true in many cases because while your app may work fine on your desktop computer with a small test data set, it's important to understand how quickly your app will respond once you go live with it, and there are hundreds of thousands of people using it.

Yes, I frequently use Big-O notation, or rather, I use the thought processes behind it, not the notation itself. Largely because so few developers in the organization(s) I frequent understand it. I don't mean to be disrespectful to those people, but in my experience, knowledge of this stuff is one of those things that "sorts the men from the boys".
I wonder if this is one of those questions that can only receive "yes" answers? It strikes me that the set of people that understand computational complexity is roughly equivalent to the set of people that think it's important. So, anyone that might answer no perhaps doesn't understand the question and therefore would skip on to the next question rather than pause to respond. Just a thought ;-)

There are points in time when you will face problems that require thinking about them. There are many real world problems that require manipulation of large set of data...
Examples are:
Maps application... like Google Maps - how would you process the road line data worldwide and draw them? and you need to draw them fast!
Logistics application... think traveling sales man on steroids
Data mining... all big enterprises requires one, how would you mine a database containing 100 tables and 10m+ rows and come up with a useful results before the trends get outdated?
Taking a course in computational complexity will help you in analyzing and choosing/creating algorithms that are efficient for such scenarios.
Believe me, something as simple as reducing a coefficient, say from T(3n) down to T(2n) can make a HUGE differences when the "n" is measured in days if not months.

There's lots of good advice here, and I'm sure most programmers have used their complexity knowledge once in a while.
However I should say understanding computational complexity is of extreme importance in the field of Games! Yes you heard it, that "useless" stuff is the kind of stuff game programming lives on.
I'd bet very few professionals probably care about the Big-O as much as game programmers.

I use complexity calculations regularly, largely because I work in the geospatial domain with very large datasets, e.g. processes involving millions and occasionally billions of cartesian coordinates. Once you start hitting multi-dimensional problems, complexity can be a real issue, as greedy algorithms that would be O(n) in one dimension suddenly hop to O(n^3) in three dimensions and it doesn't take much data to create a serious bottleneck. As I mentioned in a similar post, you also see big O notation becoming cumbersome when you start dealing with groups of complex objects of varying size. The order of complexity can also be very data dependent, with typical cases performing much better than general cases for well designed ad hoc algorithms.
It is also worth testing your algorithms under a profiler to see if what you have designed is what you have achieved. I find most bottlenecks are resolved much better with algorithm tweaking than improved processor speed for all the obvious reasons.
For more reading on general algorithms and their complexities I found Sedgewicks work both informative and accessible. For spatial algorithms, O'Rourkes book on computational geometry is excellent.

In your normal life, not near a computer you should apply concepts of complexity and parallel processing. This will allow you to be more efficient. Cache coherency. That sort of thing.

Yes, my knowledge of sorting algorithms came in handy one day when I had to sort a stack of student exams. I used merge sort (but not quicksort or heapsort). When programming, I just employ whatever sorting routine the library offers. ( haven't had to sort really large amount of data yet.)
I do use complexity theory in programming all the time, mostly in deciding which data structures to use, but also in when deciding whether or when to sort things, and for many other decisions.

'yes' and 'no'
yes) I frequently use big O-notation when developing and implementing algorithms.
E.g. when you should handle 10^3 items and complexity of the first algorithm is O(n log(n)) and of the second one O(n^3), you simply can say that first algorithm is almost real time while the second require considerable calculations.
Sometimes knowledges about NP complexities classes can be useful. It can help you to realize that you can stop thinking about inventing efficient algorithm when some NP-complete problem can be reduced to the problem you are thinking about.
no) What I have described above is a small part of the complexities theory. As a result it is difficult to say that I use it, I use minor-minor part of it.
I should admit that there are many software development project which don't touch algorithm development or usage of them in sophisticated way. In such cases complexity theory is useless. Ordinary users of algorithms frequently operate using words 'fast' and 'slow', 'x seconds' etc.

#Martin: Can you please elaborate on the thought processes behind it?
it might not be so explicit as sitting down and working out the Big-O notation for a solution, but it creates an awareness of the problem - and that steers you towards looking for a more efficient answer and away from problems in approaches you might take. e.g. O(n*n) versus something faster e.g. searching for words stored in a list versus stored in a trie (contrived example)
I find that it makes a difference with what data structures I'll choose to use, and how I'll work on large numbers of records.

A good example could be when your boss tells you to do some program and you can demonstrate by using the computational complexity theory that what your boss is asking you to do is not possible.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight