How to embed classical iterations into the oracle of Grover's algorithm in Qiskit? - quantum-computing

I am newly using Qiskit to learn quantum computing. Now I want to know if there is any way to make this:
For instance, I obtain an array X, where each element is sum of something else, say X[0] = y[0][0]+y[0][1]+...+y[0][m], X[1] = y[1][0]+y[1][1]+..., ..., till X[n], each element in X is obtained in one iteration. The task is to find a certain X[i] == k.
Then should I finish all the iterations before Grover's iterates? I have read some textbooks and papers where oracles are constructed by some binary function marking target states as 1 with conditions. What I can do now in Qiskit as I know, is to mark the index indicating the position of target element like
from qiskit.circuit.library import Diagonal
from qiskit.quantum_info import Statevector
mark_state = Statevector.from_label('100')
mark_circuit = Diagonal((-1)**mark_state.data) # circuit that induces a -1 phase on the mark_state
That's in Qiskit Terra. Another way in Qiskit Aqua is to call APIs of Oracles and Grover, but Oracles only accept logical expression, truth table and custom circuit: construction of any one needs to know the exact X and location of the X[i] as I understand.
All I want to do is to illustrate the quadric advantage taken by Grover's algorithm. If it's possible, the iterations shouldn't be traversed, at least outside the oracle, but then how to implement it in code?

Related

Efficient algorithm for removing connected components of a binary array row-by-row

I will describe the problem setup here (it is more of an algorithm sketch). Just a quick terminological note: r ~ "row", c ~ "column".
I have a binary array in which each row r has a certain number of connected components k_(c,r) = (start_c, stop_c)_r and a corresponding length L_r such that all k in r have exactly length L_r. Furthermore, the L_r values are monotonically and evenly spaced in log-space, so floor(L_(r+1)) ≥ floor(L_r) for each r. Starting from the bottom of the array, start by seeding the initial horizontal connected component k_c_seed for each column c as the bottom-most k which is nonzero at c, yielding a mapping c --> r. Then move up row-by-row, and for each k in the row, discard it if all three of the following conditions are met: (a) it overlaps with exactly one k_c_seed; (b) it is the only connected component in that row which overlaps with that k_c_seed; and (c) there is a path from any point in k to any point in k_c_seed in the original binary array; otherwise keep it and add k to the set of k_c_seed, and repeat the process for the next row up.
Here is an example of what I am after (circled in orange are the connected components which would be kept):
I am essentially wondering if there are known families of algorithms which do something like this, or perhaps a way of vectorizing this / using morphological image processing algorithms. The algorithm sketch I have given naturally falls into the domain of dynamic programming, but the issue is that I have an array with billions of entries, so this approach may not work so well in practice.

computing function of neighbors efficiently on lattice

I'm studying the Ising model, and I'm trying to efficiently compute a function H(σ) where σ is the current state of an LxL lattice (that is, σ_ij ∈ {+1, -1} for i,j ∈ {1,2,...,L}). To compute H for a particular σ, I need to perform the following calculation:
where ⟨i j⟩ indicates that sites σ_i and σ_j are nearest neighbors and (suppose) J is a constant.
A couple of questions:
Should I store my state σ as an LxL matrix or as an L2 list? Is one better than the other for memory accessing in RAM (which I guess depends on the way I'm accessing elements...)?
In either case, how can I best compute H?
Really I think this boils down to how can I access (and manipulate) the neighbors of every state most efficiently.
Some thoughts:
I see that if I loop through each element in the list or matrix that I'll be double counting, so is there a "best" way to return the unique neighbors?
Is there a better data structure that I'm not thinking of?
Your question is a bit broad and a bit confusing for me, so excuse me if my answer is not the one you are looking for, but I hope it will help (a bit).
An array is faster than a list when it comes to indexing. A matrix is a 2D array, like this for example (where N and M are both L for you):
That means that you first access a[i] and then a[i][j].
However, you can avoid this double access, by emulating a 2D array with a 1D array. In that case, if you want to access element a[i][j] in your matrix, you would now do, a[i * L + j].
That way you load once, but you multiply and add your variables, but this may still be faster in some cases.
Now as for the Nearest Neighbor question, it seems that you are using a square-lattice Ising model, which means that you are working in 2 dimensions.
A very efficient data structure for Nearest Neighbor Search in low dimensions is the kd-tree. The construction of that tree takes O(nlogn), where n is the size of your dataset.
Now you should think if it's worth it to build such a data structure.
PS: There is a plethora of libraries implementing the kd-tree, such as CGAL.
I encountered this problem during one of my school assignments and I think the solution depends on which programming language you are using.
In terms of efficiency, there is no better way than to write a for loop to sum neighbours(which are actually the set of 4 points{ (i+/-1,j+/-1)} for a given (i,j). However, when simd(sse etc) functions are available, you can re-express this as a convolution with a 2d kernel {0 1 0;1 0 1;0 1 0}. so if you use a numerical library which exploits simd functions you can obtain significant performance increase. You can see the example implementation of this here(https://github.com/zawlin/cs5340/blob/master/a1_code/denoiseIsingGibbs.py) .
Note that in this case, the performance improvement is huge because to evaluate it in python I need to write an expensive for loop.
In terms of work, there is in fact some waste as the unecessary multiplications and sum with zeros at corners and centers. So whether you can experience performance improvement depends quite a bit on your programming environment( if you are already in c/c++, it can be difficult and you need to use mkl etc to obtain good improvement)

Difference of using different population size and different crossover method

I have couple of general questions on genetic algorithm. In selection step where you pick up chromosomes from the population, is there an ideal number of chromosomes to be picked up? What difference does it make if I pick, say 10 chromosomes instead of 20? Does it have any effect on final result? At mutation stage, I've learnt there are different ways to mutate - Single point crossover, two points crossover, uniform crossover and arithmetic crossover. When should I choose one over the other? I know they sound very basic, but I couldn't find answer anywhere. So I thought I should ask in Stackoverflow.
Thanks
It seems to me that your terminology and concepts are a little bit messed up. Let me clarify.
First of all - there are many ways people call the members of the population: genotype, genome, chromosome, individual, solution... I will use solution for now as it is, in my opinion, the most general term, it is what we are eventually evolve, and also I'm not a biologist so I don't know whether genotype, genome and chromosome somehow differ and if they do what is the difference...
Population
Genetic Algorithms are population-based evolutionary algorithms. The algorithms have (usually) a fixed-sized population of solutions of the problem it is solving.
Genetic operators
There are two principal genetic operators - crossover and mutation. The goal of crossover is to take two (or more in some cases) solutions and combine them to create a solution that has some properties of both, optimally the best of both. The goal of mutation is to create new genetic material that was not previously present in the population by doing a small random change.
The choice of the particular operators, i.e. whether a single-point or multi-point crossover..., is totally problem-dependent. For example, if your solutions are composed of some logical blocks of bits that work together in each block, it might not be a good idea to use uniform crossover because it will destroy these blocks. In such case a single- or multi-point crossover is a better choice and the best choice is probably to restrict the crossover points to be on the boundaries of the blocks only.
You have to try what works best for your problem. Also, you can always use all of them, i.e. by randomly choosing which crossover operator is going to be used each time the crossover is about to be performed. Similarly for mutation.
Modes of operation
Now to your first question about the number of selected solutions. Genetic Algorithms can run in two basic modes - generational mode and steady-state mode.
Generational mode
In generational mode, the whole population is replaced in every generation (iteration) of the algorithm. A simple python-like pseudo-code for a generational-mode GA could look like this:
P = [...] # initial population
while not stopping_condition():
Pc = [] # empty population of children
while len(Pc) < len(P):
a = select(P) # select a solution from P using some selection strategy
b = select(P)
if rand() < crossover_probability:
a, b = crossover(a, b)
if rand() < mutation_probability:
a - mutation(a)
if rand() < mutation_probability:
b = mutation(b)
Pc.append(a)
Pc.append(b)
P = Pc # replace the population with the population of children
Evaluation of the solutions was omitted.
Steady-state mode
In steady-state mode, the population persists and only a few solutions are replaced in each iteration. Again, a simple steady-state GA could look like this:
P = [...] # initial population
while not stopping_condition():
a = select(P) # select a solution from P using some selection strategy
b = select(P)
if rand() < crossover_probability:
a, b = crossover(a, b)
if rand() < mutation_probability:
a - mutation(a)
if rand() < mutation_probability:
b = mutation(b)
replace(P, a) # put a child back into P based on some replacement strategy
replace(P, b)
Evaluation of the solutions was omitted.
So, the number of selected solutions depends on how do you want your algorithm to operate.

How to remove apparent redundency in numpy vector operations?

New to python and not sure about efficiency issues here. For vectors x, y, and z that represent the coordinates of n particles I can do the following computation
import numpy as np
X=np.subtract.outer(x,x)
Y=np.subtract.outer(y,y)
Z=np.subtract.outer(z,z)
R=np.sqrt(X**2+Y**2+Z**2)
A=X/R
np.fill_diagonal(A,0)
a=np.sum(A,axis=0)
With this calculation there is about a factor of 2 in redundancy in so far as multiplications and divisions go as the diagonals are not needed and the lower diagonal is just the negative of the upper diagonal. I plan to use this kind of computation inside a function call that is used by odeint - i.e. it would be called a lot and the vectors will be large - as large as my computer will handle. To remove it, naively I would end up doing a for loop which presumably is a stupid thing to do. Can I get rid of this redundancy in a vectorized way or is it even worth the effort?
Update: Based on the suggestions below, the only way I could see to improve was
ut=np.triu_indices(n,1)
X=x[ut[0]]-x[ut[1]]
With similar expressions for Y and Z and using pdist to find R. This construction only calculates the upper triangular part. Looking at the source code for pdist I am not convinced it does anything particularly smart so I think my expression above would be equally good. The use of squareform only produces the symmetric form. For the antisymmetric may as well use
B=np.zeros((n,n),dtype=np.float64)
B(ut[0],ut[1])=A
B=B-B.T
This cannot be slower than square form because this is pretty much exactly what squareform does. Since the function is called often it would seem to me that ut should be made static along with storage for others (X,Y,Z,A,B). However being new to python I'm not sure how that is done.

Determining Bias for Neural network Perceptrons?

This is one thing in my beginning of understand neural networks is I don't quite understand what to initially set a "bias" at?
I understand the Perceptron calculates it's output based on:
P * W + b > 0
and then you could calculate a learning pattern based on b = b + [ G - O ] where G is the Correct Output, and O is the actual Output (1 or 0) to calculate a new bias...but what about an initial bias.....I don't really understand how this is calculated, or what initial value should be used besides just "guessing", is there any type of formula for this?
Pardon if Im mistaken on anything, Im still learning the whole Neural network idea before I implement my own (crappy) one.
The same goes for learning rate.....I mean most books and such just kinda "pick one" for μ.
The short answer is, it depends...
In most cases (I believe) you can just treat the bias just like any other weight (so it might get initialised to some small random value), and it will get updated as you train your network. The idea is that all the biases and weights will end up converging on some useful set of values.
However, you can also set the weights manually (with no training) to get some special behaviours: for example, you can use the bias to make a perceptron behave like a logic gate (assume binary inputs X1 and X2 are either 0 or 1, and the activation function is scaled to give an output of 0 or 1).
OR gate: W1=1, W2=1, Bias=0
AND gate: W1=1, W2=1, Bias=-1
You can solve the classic XOR problem by using AND and OR as the first layer in a multilayer network, and feed them into a third perceptron with W1=3 (from the OR gate), W2=-2 (from the AND gate) and Bias=-2, like this:
(Note: these values will be different if your activation function is scaled to -1/+1, ie a SGN function)
As to how to set the learning rate, that also depends(!) but I think usually something like 0.01 is recommended. Basically you want the system to learn as quickly as possible, but not so quickly that the weights fail to converge properly.
Since #Richard has already answered the greater part of the question I'll only elaborate on the learning rate. From what I've read (and it's working) there is a very simple formula that you can use in order to update the learning rate for each iteration k and it is:
learningRate_k = constant/k
Here obviously the 0th iteration is excluded since you'll be dividing by zero. The constant can be whatever you want it to be (except 0 of course since it will not be making any sense :D) but the easiest is naturally 1 so you get
learningRate_k = 1/k
The resulting series obeys two basic rules:
lim_(t->inf) SUM from k=1 to t (learningRate_k) = inf
lim_(t->inf) SUM from k=1 to t (learningRate_k^2) < inf
Note that the convergence of your perceptron is directly connected to the learning rate series. It starts big (for k=1 you get 1/1=1) and gets smaller and smaller with each and every update of your perceptron since - as in real life - when you encounter something new at the beginning you learn a lot but later on you learn less and less.

Resources