From what I'm reading, building the constant-1 and constant-0 operations in a quantum computer involves building something like this, where there's two qbits being used. Why do we need two?
The bottom qbit in both examples is not being used at all, so has no impact on the operation. Both operations seemingly only work if the top qbit's initial value is 0 so surely what this is just saying is that this is an operation which either flips a 0 or leaves it alone - in which case what is the second qbit needed for? Wouldn't a set-to-0 function set the input to 0 whatever it is and wouldn't need one of it's inputs to be predetermined?
Granted, the 'output' qbit is for output, but it's value still needs to be predetermined going in to the operation?
Update: I've posted this on the quantum computing stack exchange with links to a couple of blogs/video where you can see the below being brought up.
I am making an expectimax AI, and the branching factor of this game is unpredictable, ranging from 6 - 20. I'm currently exploring the game tree for 1 second every turn, then making sure the whole game tree is explored to the same depth, but occasionally this results in a very large slowdown, if branching factor for a particular turn jumps up radically. Is if OK if I cut off exploration when parts of the game tree are not explored as deeply? Will this affect the mathematical properties of expectimax at all?
Short answer: I'm pretty sure you lose the mathematical guarantees, but the extent to which this affects your program's performance will probably depend on the game and your board evaluation function.
Here's an abstract scenario to give you some intuition for where having different branch lengths might create the most problems: say that, for player one, the best move is something that takes a few turns to set up. Let's say this set up is not something that your board evaluation function can pick up on. In this case, regardless of what player 2 does in the mean time, there will be a point a few moves in the future where the score of the board will swing in the direction that favors player 1. If one branch gets far enough to see that move and another doesn't, it will look like the first is a worse option for player 2, despite the fact that the same thing will happen on the other branch. If the move that player 2 made in the first branch was actually better than the move it made in the second branch, this would lead to a suboptimal choice.
On the other hand, a perfect board evaluator would make this impossible (it would recognize player 1 setting up their move). There are also games in which setting up moves in advance like this is not possible. But the existence of this case is a red flag.
Fundamentally, branches that didn't get evaluated as far have greater uncertainty in their estimations of how good a move is. This will sometimes result in them being chosen when they shouldn't be and other times it will result in them not being chosen when they should be. As a result, I would strongly suspect that you lose mathematical guarantees by doing this. That said, the practical impact of this problem on performance may or may not be substantial.
There might be some way around this if you incorporate the current turn number into the board evaluation function and adjust for it accordingly. Minimally, that would allow you to explicitly account for the added uncertainty in shorter branches.
From what I've read so far they seem very similar.
Differential evolution uses floating point numbers instead, and the solutions are called vectors? I'm not quite sure what that means.
If someone could provide an overview with a little bit about the advantages and disadvantages of both.
Well, both genetic algorithms and differential evolution are examples of evolutionary computation.
Genetic algorithms keep pretty closely to the metaphor of genetic reproduction. Even the language is mostly the same-- both talk of chromosomes, both talk of genes, the genes are distinct alphabets, both talk of crossover, and the crossover is fairly close to a low-level understanding of genetic reproduction, etc.
Differential evolution is in the same style, but the correspondences are not as exact. The first big change is that DE is using actual real numbers (in the strict mathematical sense-- they're implemented as floats, or doubles, or whatever, but in theory they're ranging over the field of reals.) As a result, the ideas of mutation and crossover are substantially different. The mutation operator is modified so far that it's hard for me to even see why it's called mutation, as such, except that it serves the same purpose of breaking things out of local minima.
On the plus side, there are a handful of results showing DEs are often more effective and/or more efficient than genetic algorithms. And when working in numerical optimization, it's nice to be able to represent things as actual real numbers instead of having to work your way around to a chromosomal kind of representation, first. (Note: I've read about them, but I've not messed extensively with them so I can't really comment from first hand knowledge.)
On the negative side, I don't think there's been any proof of convergence for DEs, yet.
Differential evolution is actually a specific subset of the broader space of genetic algorithms, with the following restrictions:
The genotype is some form of real-valued vector
The mutation / crossover operations make use of the difference between two or more vectors in the population to create a new vector (typically by adding some random proportion of the difference to one of the existing vectors, plus a small amount of random noise)
DE performs well for certain situations because the vectors can be considered to form a "cloud" that explores the high value areas of the solution solution space quite effectively. It's pretty closely related to particle swarm optimization in some senses.
It still has the usual GA problem of getting stuck in local minima however.
Is it possible to solve a problem of O(n!) complexity within a reasonable time given infinite number of processing units and infinite space?
The typical example of O(n!) problem is brute-force search: trying all permutations (ordered combinations).
It sure is. Consider the Traveling Salesman Problem in it's strict NP form: given this list of costs for traveling from each point to each other point, can you put together a tour with cost less than K? With the new infinite-core CPU from Intel, you just assign one core to each possible permutation, and add up the costs (this is fast), and see if any core flags a success.
More generally, a problem in NP is a decision problem such that a potential solution can be verified in polynomial time (i.e., efficiently), and so (since the potential solutions are enumerable) any such problem can be efficiently solved with sufficiently many CPUs.
It sounds like what you're really asking is whether a problem of O(n!) complexity can be reduced to O(n^a) on a non-deterministic machine; in other words, whether Not-P = NP. The answer to that question is no, there are some Not-P problems that are not NP. For example, a limited halting problem (that asks if a program halts in at most n! steps).
The problem would be distributing the work and collecting the results.
If all the CPUs can read the same piece of memory at once, and if each one has a unique CPU-ID that is known to it, then the ID may be used to select a permutation, and the distribution problem is solveable in constant time.
Gathering the results would be tricky, though. Each CPU could compare with its (numerical) neighbor, and then that result compared to the result of the two closest neighbors, etc. This will be a O(log(n!)) process. I don't know for sure, but I suspect that O(log(n!)) is hyperpolynomial, so I don't think that's a solution.
No, N! is even higher than NP. Thinking unlimited parallelism could solve NP problem in polynomial time, which is usually considered as a "reasonable" time complexity, N! problem is still higher than polynomial on such a setup.
You mentioned search as a "typical" problem, but were you actually asked specifically about a search problem? If so, then yes, search is typically parallelizable, but as far as I can tell O(n!) in principle does not imply the degree of concurrency available, does it? You could have a completely serial O(n!) problem, which means infinite computers won't help. I once had an unusual O(n^4) problem that actually was completely serial.
So, available concurrency is the first thing, and IMHO you should get points for bringing up Amdahl's law in an interview. Next potential pitfall is inter-processor communication, and in general the nature of the algorithm. Consider, for example, this list of application classes: http://view.eecs.berkeley.edu/wiki/Dwarf_Mine. FWIW the O(n^4) code I mentioned earlier sort of falls into the FSM category.
Another somewhat related anecdote: I've heard an engineer from a supercomputer vendor claim that if 10% of their CPU time were being spent in MPI libraries, they consider the parallelization a solid success (though that may have just been limited to codes in the computational chemistry domain).
If the problem is one of checking permutations/answers to a problem of complexity O(n!), then of course you can do it efficiently with an infinite number of processors.
The reason is that you can easily distribute atomic pieces of the problem (an atomic piece of the problem might, say, be one of the permutations to check) with logarithmic efficiency.
As a simple example, you could set up the processors as a 'binary tree', so to speak. You could be at the root, and have the processors deliver permutations of the problem (or whatever the smallest pieces of the problem might be) to the leaf processors to solve, and you'd end up solving the problem in log(n!) time.
Remember it's the delivery of the permutations to the processors that takes a long time. Each part of the problem itself will actually be solved instantly.
Edit: Fixed my post according to the comments below.
Sometimes the correct answer is, "How many times does this come up with your code base?" but in this case, there is a real answer.
The correct answer is no, because not all problems can be solved using perfect parallel processing. For example, a travelling salesman-like problem must commit to one path for the second leg of the journey to be considered.
Assuming a fully connected matrix of cities, should you want to display all possible non-cyclic routes for our weary salesman, you're stuck with a O(n!) problem, which can be decomposed to an O(n)*O((n-1)!) problem. The issue is that you need to commit to one path (on the O(n) side of the equation) before you can consider the remaining paths (on the O((n-1)!) side of the equation).
Since some of the computations must be performed prior to other computations, then there is no way to scatter the results perfectly in a single scatter / gather pass. That means the solution will be waiting on the results of calculations which must come before the "next" step can be started. This is the key, as the need for prior partial solutions provide a "bottle neck" in the ability to proceed with the computation.
Since we've proven we can make a number of these infinitely fast, infinitely numerous, CPUs wait (even if they are waiting on themselves), we know that the runtime cannot be O(1), and we only need to pick a very large N to guarantee an "unacceptable" run time.
This is like asking if an infinite number of monkeys typing on a monkey-destruction proof computer with a word-processor can come up with all the works of Shakespeare; given an infinite amount of time. The realist would say not since the conditions are no physically possible. The idealist will say yes; in theory it can happen. Since Software Engineering (Software Engineering, not Computer Science) focuses on real system we can see and touch, then the answer is no. If you doubt me, then go build it and prove me wrong! IMHO.
Disregarding the cost of setup (whatever that might be...assigning a range of values to a processing unit, for instance), then yes. In such a case, any value less than infinity could be solved in one concurrent iteration across an equal number of processing units.
Setup, however, is something significant to disregard.
Each problem could be solved by one CPU, but who would deliver these jobs to all infinite CPU's? In general, this task is centralized, so if we have infinite jobs to deliver to all infinite CPU's, we could take infinite time to do so.
Does a general proof exist for the equivalence of two (deterministic) finite state machines that always takes finite time? That is, given two FSMs, can you prove that given the same inputs they will always produce the same outputs without actually needing to execute the FSMs (which could be non-terminating?). If such a proof does exist, what is the time complexity?
There is a proof, though I don't know it. Look for Sipser's textbook on the subject, that's where I know of it from.
Scrounging my memory: basically, there is a unique minimal DFA for a given DFA, and there is a minimization algorithm that always terminates. Minimize both A and B, and see if they have the same minimal DFA. I don't know the complexity of the minimization, though its not too bad (I think its polynomial). Graph isomorphism is pretty hard to compute, but because there's a special starting node, it may hopefully be somewhat easier. You may not even require graph isomorphism, to be honest.
But no, you don't ever need to actually run the DFAs, just analyze their structure.
Suppose you have two FSMs with O(n) states. Then you can make an FSM of size O(n2) that recognizes only the symmetric difference of their accept languages. (Make an FSM that has states that correspond to a pair of states, one from each FSM. Then on each step, update each part of the pair simultaneously. A state in the new FSM is an accept state iff exactly one of the pair was an accept state.) Now minimize this FSM and see if it is the same as the trivial one-state FSM that rejects everything. Minimizing an FSM with m states takes time O(m log m), so overall you can do everything in time O(n2 log n).
#Agor correctly states that Sipser is a good reference for these sorts of things. The key point of my answer is that you can do this in polynomial time, even with a small exponent.