Why does removing the uncomputation step at the last iteration of Grover's algorithm worsen results? - quantum-computing

When messing around with qiskit.org's article on Sudoku solving with Grover's algorithm, I noticed that removing the uncomputation step in the last iteration of the algorithm worsened the likelihood to get valid solutions from running the program. This seems unintuitive, as the uncomputation step had previously been described as a way to reset ancillary qubits to their original state ahead of the next iteration of the algorithm. Since we can discard the ancillary qubits at the end of the calculation, it would seem that uncomputation isn't necessary for the last iteration of the algorithm, however this does not seem to be the case experimentally. Why?
My guess is that it has something to do with entanglement and phase kickback when the output qubit is set to |->, but I can't wrap my head around as to the specific reason for this effect.


Why do the constant operations in a quantum computer need second qbits?

From what I'm reading, building the constant-1 and constant-0 operations in a quantum computer involves building something like this, where there's two qbits being used. Why do we need two?
The bottom qbit in both examples is not being used at all, so has no impact on the operation. Both operations seemingly only work if the top qbit's initial value is 0 so surely what this is just saying is that this is an operation which either flips a 0 or leaves it alone - in which case what is the second qbit needed for? Wouldn't a set-to-0 function set the input to 0 whatever it is and wouldn't need one of it's inputs to be predetermined?
Granted, the 'output' qbit is for output, but it's value still needs to be predetermined going in to the operation?
Update: I've posted this on the quantum computing stack exchange with links to a couple of blogs/video where you can see the below being brought up.

How to account for move order in chess board evaluation

I am programming a Chess AI using an alpha-beta pruning algorithm that works at fixed depth. I was quite surprised to see that by setting the AI to a higher depth, it played even worse. But I think I figured it why so.
It currently works that way : All positions are listed, and for each of them, every other positions from that move is listed and so on... Until the fixed depth is reached : the board is evaluated by checking what pieces are present and by setting a value for every piece types. Then, the value bubbles up to the root using the minimax algorithm with alpha-beta.
But I need to account for the move order. For instance, there is two options, a checkmate in 2 moves, and another in 7 moves, then the first one has to be chosen. The same thing goes to taking a queen in whether 3 or 6 moves.
But since I only evaluate the board at the deepest nodes and that I only check the board as the evaluation result, it doesn't know what was the previous moves were.
I'm sure there is a better way to evaluate the game that can account for the way the pieces moved through the search.
EDIT: I figured out why it was playing weird. When I searched for moves (depth 5), it ended with a AI move (a MAX node level). By doing so, it counted moves such as taking a knight with a rook, even if it made the latter vulnerable (the algorithm cannot see it because it doesn't search deeper than that).
So I changed that and I set depth to 6, so it ends with a MIN node level.
Its moves now make more sense as it actually takes revenge when attacked (what it sometimes didn't do and instead played a dumb move).
However, it is now more defensive than ever and does not play : it moves its knight, then moves it back to the place it was before, and therefore, it ends up losing.
My evaluation is very standard, only the presence of pieces matters to the node value so it is free to pick the strategy it wants without forcing it to do stuff it doesn't need to.
Consedering that, is that a normal behaviour for my algorithm ? Is it a sign that my alpha-beta algorithm is badly implemented or is it perfectly normal with such an evaluation function ?
If you want to select the shortest path to a win, you probably also want to select the longest path to a loss. If you were to try to account for this in the evaluation function, you would have to the path length along with the score and have separate evaluation functions for min and max. It's a lot of complex and confusing overhead.
The standard way to solve this problem is with an iterative deepening approach to the evaluation. First you search deep enough for 1 move for all players, then you run the entire search again searching 2 moves for each player, etc until you run out of time. If you find a win in 2 moves, you stop searching and you'll never run into the 7 moves situation. This also solves your problem of searching odd depths and getting strange evaluations. It has many other benefits, like always having a move ready to go when you run out of time, and some significant algorithmic improvements because you won't need the overhead of tracking visited states.
As for the defensive play, that is a little bit of the horizon effect and a little bit of the evaluation function. If you have a perfect evaluation function, the algorithm only needs to see one move deep. If it's not perfect (and it's not), then you'll need to get much deeper into search. Last I checked, algorithms that can run on your laptop and see about 8 plys deep (a ply is 1 move for each player) can compete with strong humans.
In order to let the program choose the shortest checkmate, the standard approach is to give a higher value to mates that occur closer to the root. Of course, you must detect checkmates, and give them some score.
Also, from what you describe, you need a quiescence search.
All of this (and much more) is explained in the chess programming wiki. You should check it out:

What is wrong with the implementation of this inversion count algorithm?

I am doing a question on www.hackerrank.com and I have been stuck on it for days.
Here is the statement of the question https://www.hackerrank.com/challenges/insertion-sort. Basically, I have to count how many swaps occur in insertion sort for a given array in O(nlog(n)) time.
http://paste.ubuntu.com/12637144/ Here is my submitted code. I use merge sort and count how many times each element is displaced. This code passes for more than half of the site's tests. When it fails it doesn't time out, and it doesn't have a compilation error or segmentation fault.
Furthermore, when I got the input for one of the failed test cases (Here is the input that it failed on the site http://paste.ubuntu.com/12637165/) and tested it with this variation of my code http://paste.ubuntu.com/12637127/ which actually runs the insertion sort algorithm counting the number of swaps along the way and checks it against the merge sort count, I pass all of the tests. Also, I have generated thousands of random test cases, and they also all pass using this test.
I don't think its a problem on the site's end because in the discussions for the problem other people seem to be passing the tests just fine without any questions or complaints. So maybe I am misunderstanding either the question or I am simply writing both the algorithm and the test cases for the algorithm wrong. Does anyone have any suggestions?
If n can be upto 100000, then the no. of inversions can be ~= n^2 / 2 which wont fit in a 32 bit integer. Try using a 64 bit integer for counting and for return value of mergeSort.

What is the logical difference between loops and recursive functions?

I came across this video which is discussing how most recursive functions can be written with for loops but when I thought about it, I couldn't see the logical difference between the two. I found this topic here but it only focuses on the practical difference as do many other similar topics on the web so what is the logical difference in the way a loop and a recursion are handled?
Bottom line up front -- recursion is more versatile but in practice is generally less efficient than looping.
A loop could in principle always be implemented as a recursion if you wished to do so. In practice the limits of stack resources put serious constraints on the size of the problems you can address. I can and have built loops that iterate a billion times, something I'd never try with recursion unless I was certain the compiler could and would convert the recursion into a loop. Because of the stack limits and efficiency, people often try to find a looping equivalent for recursions.
Tail recursions can always be converted to loops. However, there are recursions that can't be converted. As an example, I work with statistical design of experiments. Sometimes a large design is constructed by "crossing" several smaller sub-designs. Crossing is where you concatenate every row of a second design to each row of the first. For two sub-designs, all this needs is simple nested looping, but for three or more designs you need to increase the level of nesting, adding one level of nesting for each additional sub-design. So while this is nested looping in principle, in practice the amount of nesting is variable. If you tried to implement it with looping you'd have to revise your program to add/subtract nested loops every time you were dealing with a different number of sub-designs to be crossed, so you can't write an immutable loop-based version. This can easily be implemented with recursion. In this case, I'm happy to trade a slight amount of efficiency, because I wrote and debugged the code 6 years ago and haven't had to revise it since, despite creating lots of crossed designs of varying complexity since then.
One way to think through this is that the choice for recursion or iteration depends on how you think about the problem being solved. Certain "ways of thinking" lead more naturally to recursive solutions, and other ways of thinking lead to more iterative solutions. For any problem, you can in principle think in a way that gives you a recursive solution or a way that gives you an iterative solution. (Sometimes the iterative solution will just end up simulating a recursion stack, but there is no actual recursion there.)
Here's an example. You have an array of integers (positive or negative), and you want to find the maximum segment sum. A segment is a piece of the array that is contiguous. So in the array [3, -4, 2, 1, -2, 4], the maximum segment sum is 5, and you get that from the segment [2, 1, -2, 4]; its sum is 5.
OK - so how might we solve this problem? One thing you might do is reason like this: "if I knew the maximum segment sum in the left half, and the maximum segment sum in the right half, then maybe I could somehow jam those together and figure out the maximum segment sum overall". This idea would require you to find the maximum segment sum on the two subhalves, and this is a smaller instance of the original problem. This is recursion, and a direct translation of this idea into code would therefore be recursive.
But the maximum segment sum problem isn't "recursive" or "iterative" -- it can be both, depending on how you think about the solution. I gave a recursive thought process above. Here is an iterative process: "well, if I add up the elements in each of the segments that start at some index i and end at some index j, I can just take the maximum of these to solve the problem". And directly trying to code this approach would give you triply nested loops (and a bad mark on an assignment because it's horribly inefficient!).
So, the same problem, depending on how the problem is conceptualized, can lead to a recursive or iterative solution. Now, I happened to choose a problem where there are many ways of solving it, and where there are reasonable recursive and iterative solutions. Some problems, however, admit only one type of solution, and that solution may be most naturally implemented using recursion or iteration. For example, if I asked you to write a function that keeps asking the user to enter a letter until they enter y or n, you might start thinking: "keep repeating the prompt and asking for input..." and before you know it you have some iterative code. Perhaps you might instead think recursively: "if the user enters y or n, I am done; otherwise ask the user for y or n"... in which case you'd generate a recursive algorithm. But the recursion here doesn't give you much: it unnecessarily uses a stack and doesn't make the program any faster. (Recursion sometimes makes it easier to prove correctness, in which case you might present something recursively even though you could alternately give a reasonable iterative solution.)

The limits of parallelism (job-interview question)

Is it possible to solve a problem of O(n!) complexity within a reasonable time given infinite number of processing units and infinite space?
The typical example of O(n!) problem is brute-force search: trying all permutations (ordered combinations).
It sure is. Consider the Traveling Salesman Problem in it's strict NP form: given this list of costs for traveling from each point to each other point, can you put together a tour with cost less than K? With the new infinite-core CPU from Intel, you just assign one core to each possible permutation, and add up the costs (this is fast), and see if any core flags a success.
More generally, a problem in NP is a decision problem such that a potential solution can be verified in polynomial time (i.e., efficiently), and so (since the potential solutions are enumerable) any such problem can be efficiently solved with sufficiently many CPUs.
It sounds like what you're really asking is whether a problem of O(n!) complexity can be reduced to O(n^a) on a non-deterministic machine; in other words, whether Not-P = NP. The answer to that question is no, there are some Not-P problems that are not NP. For example, a limited halting problem (that asks if a program halts in at most n! steps).
The problem would be distributing the work and collecting the results.
If all the CPUs can read the same piece of memory at once, and if each one has a unique CPU-ID that is known to it, then the ID may be used to select a permutation, and the distribution problem is solveable in constant time.
Gathering the results would be tricky, though. Each CPU could compare with its (numerical) neighbor, and then that result compared to the result of the two closest neighbors, etc. This will be a O(log(n!)) process. I don't know for sure, but I suspect that O(log(n!)) is hyperpolynomial, so I don't think that's a solution.
No, N! is even higher than NP. Thinking unlimited parallelism could solve NP problem in polynomial time, which is usually considered as a "reasonable" time complexity, N! problem is still higher than polynomial on such a setup.
You mentioned search as a "typical" problem, but were you actually asked specifically about a search problem? If so, then yes, search is typically parallelizable, but as far as I can tell O(n!) in principle does not imply the degree of concurrency available, does it? You could have a completely serial O(n!) problem, which means infinite computers won't help. I once had an unusual O(n^4) problem that actually was completely serial.
So, available concurrency is the first thing, and IMHO you should get points for bringing up Amdahl's law in an interview. Next potential pitfall is inter-processor communication, and in general the nature of the algorithm. Consider, for example, this list of application classes: http://view.eecs.berkeley.edu/wiki/Dwarf_Mine. FWIW the O(n^4) code I mentioned earlier sort of falls into the FSM category.
Another somewhat related anecdote: I've heard an engineer from a supercomputer vendor claim that if 10% of their CPU time were being spent in MPI libraries, they consider the parallelization a solid success (though that may have just been limited to codes in the computational chemistry domain).
If the problem is one of checking permutations/answers to a problem of complexity O(n!), then of course you can do it efficiently with an infinite number of processors.
The reason is that you can easily distribute atomic pieces of the problem (an atomic piece of the problem might, say, be one of the permutations to check) with logarithmic efficiency.
As a simple example, you could set up the processors as a 'binary tree', so to speak. You could be at the root, and have the processors deliver permutations of the problem (or whatever the smallest pieces of the problem might be) to the leaf processors to solve, and you'd end up solving the problem in log(n!) time.
Remember it's the delivery of the permutations to the processors that takes a long time. Each part of the problem itself will actually be solved instantly.
Edit: Fixed my post according to the comments below.
Sometimes the correct answer is, "How many times does this come up with your code base?" but in this case, there is a real answer.
The correct answer is no, because not all problems can be solved using perfect parallel processing. For example, a travelling salesman-like problem must commit to one path for the second leg of the journey to be considered.
Assuming a fully connected matrix of cities, should you want to display all possible non-cyclic routes for our weary salesman, you're stuck with a O(n!) problem, which can be decomposed to an O(n)*O((n-1)!) problem. The issue is that you need to commit to one path (on the O(n) side of the equation) before you can consider the remaining paths (on the O((n-1)!) side of the equation).
Since some of the computations must be performed prior to other computations, then there is no way to scatter the results perfectly in a single scatter / gather pass. That means the solution will be waiting on the results of calculations which must come before the "next" step can be started. This is the key, as the need for prior partial solutions provide a "bottle neck" in the ability to proceed with the computation.
Since we've proven we can make a number of these infinitely fast, infinitely numerous, CPUs wait (even if they are waiting on themselves), we know that the runtime cannot be O(1), and we only need to pick a very large N to guarantee an "unacceptable" run time.
This is like asking if an infinite number of monkeys typing on a monkey-destruction proof computer with a word-processor can come up with all the works of Shakespeare; given an infinite amount of time. The realist would say not since the conditions are no physically possible. The idealist will say yes; in theory it can happen. Since Software Engineering (Software Engineering, not Computer Science) focuses on real system we can see and touch, then the answer is no. If you doubt me, then go build it and prove me wrong! IMHO.
Disregarding the cost of setup (whatever that might be...assigning a range of values to a processing unit, for instance), then yes. In such a case, any value less than infinity could be solved in one concurrent iteration across an equal number of processing units.
Setup, however, is something significant to disregard.
Each problem could be solved by one CPU, but who would deliver these jobs to all infinite CPU's? In general, this task is centralized, so if we have infinite jobs to deliver to all infinite CPU's, we could take infinite time to do so.
