I have a state |Q> of n bits and want to measure the bit number i. Is there a matrix to apply on the state, so the state Q ends up to Q', like the Hadamard or X gates?
Or I should apply the measurement matrix |x><x| based on the outcome of the measurement, if 0 then x=0, and if 1 then x=1?
Although we often represent measurement as an operation that applies to a single qubit, it doesn't act like other single-qubit operations. There are some details omitted.
Equivalence w/ CNOT
Measuring a qubit is equivalent to using it as the control for a CNOT that toggles an otherwise unused ancilla qubit. Knowing this equivalence is useful, because it lets you translate what you know about two-qubit unitary operations into facts about measurement.
Here's a circuit showing that a qubit rotated around the Y axis ends up in the same mixed state when you measure as it does when you CNOT-onto-ancilla. The green circle things are Bloch sphere representations of each qubit's marginal state:
(If you want to use this CNOT trick to compute the mixed state result, instead of a pure state, just represent the state as a density matrix then trace over the ancilla qubit after performing the CNOT.)
Basically, measurement is observationally indistinguishable from making entangled copies. The difference, in practical terms, is that measurement is thermodynamically irreversible whereas a CNOT is easy to reverse.
Expected Outcomes
If you ignore the measurement result, then measurement acts like a projection of the density matrix. For example, in the animation above, notice that measurement causes the state to snap to (be projected onto) the Z axis of the Bloch sphere.
If you have access to the measurement result, then the measurement not only projects but also informs you of the new state of the system. In the single-qubit-in-the-computational-basis case, this forces the qubit to be all-ON or all-OFF due to the quantization of spin.
Measurements can be represented in various ways.
A very common representation is "projective measurements". Projective measurements are represented by a Hermitian matrix (called the "observable"). The eigenvalues of the matrix are the possible results. You get the probability of each result by projecting your state's density matrix into each eigenspace and tracing.
A more flexible and arguably better representation is positive-operator valued measures (POVM measurements). POVMs are represented by a set of squared Hermitian matrices, with the condition that the sum of the set's matrices must be the identity matrix. The probability of the result corresponding to the squared matrix F from the set is the trace of the state's density matrix times F.
Translating a projective measurement into a circuit that performs that measurement (using only computational basis measurements) is straightforward, because the necessary basis change operation is just a unitary matrix whose rows are the eigenvectors of the observable. Translating POVM measurements is trickier, and requires introducing ancilla bits.
For more information, see this answer on the physics stackexchange.
The measurement works as follows:
if you want to measure qubit number i (indexing from 1 to n), then based on the probability associated with all states, the outcome of measuring qubit i is 0 or 1 randomly with higher chance for the higher probability.
P_i(0) = <Q| M'0 M0 |Q>
P_i(1) = <Q| M'1 M1 |Q>
where P_i(0) is the probability of measuring qubit i to be 0, and P_i(1) is the probability of being 1. M0 is the measurment matrix of 0, and M1 is for 1. M'0 is M0 hermitian, and M'1 is M1 hermitian.
if you want to measure only the i-th qubit of the quantum system which is in state |Q> of n qubits. then the operation you would apply is:
I x I x I x I x ... x I x Mb x I x ... x I } n kronecker multiplication
1 2 3 4 ... i-1 i i+1 ... n } indices
where I is the identity matrix, Mb is the measurement matrix based on the measured value of the i-th either b=0, or b=1. x is the kronecker multiplication.
pre measurement state |Q>
measurement of qubit i = b (b = 1 or 0 randomly selected based on the probability of each)
if b is 0: Mb = M0 = |0><0|
if b is 1: Mb = M1 = |1><1|
M = I x I x I x ... x I x Mb x I x ... x I
post state |Q'> = M|Q>
By definition, the gate 1/sqrt(5) (I + 2iZ) should act on a qubit a|0> + b|1> to transform it into 1/sqrt(5) ((1+2i)a|0> + (1-2i)b|1>) but transformations of each RUS step does the following-
The ancillas are in |+> state at first
Starting form: 1/sqrt(2) (a,b,a,b,a,b,a,b)
CCNOT(ancillas, input): 1/sqrt(2) (a,b,a,b,a,b,b,a)
S(input): 1/sqrt(2) (a,ib,a,ib,a,ib,b,ia)
CCNOT(ancillas, input): 1/sqrt(2) (a,ib,a,ib,a,ib,ia,b)
Z(input) : 1/sqrt(2) (a,-ib,a,-ib,a,-ib,ia,-b)
Now measuring the ancillas in PauliX basis is equivalent to PauliZ measurement after applying H() to the state. Now I have 2 confusions, should I apply H x H x I or H x H x H to the combined state. Also neither of these transformations turn out to be equivalent to the V-gate defined in the first paragraph when both measurements are Zero. Where did I go wrong?
Reference: https://github.com/microsoft/Quantum/blob/master/samples/diagnostics/unit-testing/RepeatUntilSuccessCircuits.qs (1st sample code)
The transformation is correct, though it takes some time with pen and paper to verify it.
As a side note, we start with a state |+>|+>(a|0> + b|1>), which is 0.5 (a,b,a,b,a,b,a,b) in vector form (both |+> states contribute a 1/sqrt(2) to the coefficients). It will not affect our calculations of the state after the measurement, since it will have to be renormalized, but it's still worth noting.
After a sequence of CCNOT, S, CCNOT, Z we get 0.5 (a,-ib,a,-ib,a,-ib,ia,-b). Since we're measuring only the first two qubits in PauliX basis, we need to apply Hadamards only to the first two qubits, or H x H x I to the combined state.
I'll take the liberty to skip writing out the whole expression after applying Hadamards and fast-forward to the results of measurements, and here is why. We're only interested in the state of the input qubit if both measurements yielded 0, so it's sufficient to gather only the terms of the overall state which have |00> as the state of the first two qubits.
The state of the third qubit after measuring |00> on the first qubit will be: (3+i)a |0> - (3i+1)b |1>, multiplied by some normalization coefficient c.
c = 1/sqrt(|3+i|^2 + |3i+1|^2) = 1/sqrt(10)).
Now we need to check whether the state we got, |S_actual> = 1/sqrt(10) ((3+i)a |0> - (3i+1)b |1>)
is the same state as we'd expect to get from applying the V gate,
|S_expected> = 1/sqrt(5) ((1+2i)a |0> + (1-2i)b |1>). They do not look the same, but remember that in quantum computing the states are defined up to a global phase. Thus, if we can find a complex number p with an absolute value 1 for which |S_actual> = p * |S_expected>, the states will be effectively the same.
This translates into the following equations for p and amplitudes of |0> and |1>: (3+i)/sqrt(2) = p (1+2i) and -(3i+1)/sqrt(2) = p (1-2i). We solve both equations to get p = (1-i)/sqrt(2) which has indeed the absolute value 1.
Thus, we can conclude that indeed the state we got after all the transformations is indeed equivalent to the state we'd get by applying a V gate.
We have an array consisting of each entry as a tuple of two integers. Let the array be A = [(a1, b1), (a2, b2), .... , (an, bn)]. Now we have multiple queries where we are given an integer x, we need to find the maximum value of ai + |x - bi| for 1 <= i <= n.
I understand this can be easily achieved in O(n) time complexity for each query but I am looking for something faster than that, probably O(log n) for each query. I can preprocess the array in O(n) time, but the queries should be done faster than O(n).
Any kind of help would be appreciated.
It seems to be way too easy to over-think this.
For n = 1, the function is v-shaped with a minimum of a1 at b1, with slopes of -1 and 1, respectively - let's call these values ac and bc (for combined).
For an additional pair (ai, bi), one of the pairs may dominate the other (|bc - bi| ≤ |ac - ai), which may then be ignored.
Otherwise, the falling slope of the combination will be from the pair with the larger b, the rising slope from the other.
The minimum will be between the individual b, closer to the b of the pair with the larger a, the distance being half the difference between the (absolute value of the) "coordinate" differences, the minimum value that amount higher.
The main catch is that neither needs to be an integer - the only alternative being exactly in the middle between two integers.
(Ending up with the falling slope from max ai + bi, and the rising slope of max ai - bi.)
I have to code AI to find mass of a spaceship in a game.
My AI can exert a little force c to the spaceship, to measure the mass via change of velocity.
However, my AI can access only current position of spaceship ,x, in every time-step.
Mass is not constant, but it is safe to assume that it will not change too fast.
For simplicity :-
Let the space be 1D, and has no gravity.
Timestep is always 1 second.
There are many forces that exert on the spaceship currently, e.g. gravity, an automatic propulsion system controlled by an unknown AI, collision impulse, etc.
The summation of these forces is b, which depends on t (time).
Acceleration a for a certain timestep is calculated by a game-play formula which is out of my control:-
a = (b+c)/m ................. (1)
The velocity v is updated as:-
v = vOld + a ................. (2)
The position x is updated as:-
x = xOld + v ................. (3)
The order of execution (1)-(3) is also unknown, i.e. AI should not rely on such order.
My poor solution
I will exert c0=0.001 for a few second and compare result against when I exert c1=-0.001.
I would assume that b and m are constant for the time period.
I calculate acceleration via :-
t 0 1 2 3 (Exert force `c0` at `t1`, `c1` at `t2`)
x 0 1 2 3 (The number are points in timeline that I sampling x.)
v 0 1 2 (v0=x1-x0, v1=x2-x1, ... )
a 0 1 (a0=v1-v0, ... )
Now I know acceleration of 2 points of timeline, and I can cache c because I am the one who exert it.
With a = (b+c)/m, with unknown b and m and known a0,a1,c0 and c1:-
a0 = (b+c0)/m
a1 = (b+c1)/m
I can solve them to find b and m.
However, my assumption is wrong at the beginning.
b and m are actually not constants.
This problem might be viewed in a more casual way :-
Many persons are trying to lift a heavy rock.
I am one of them.
How can I measure the mass of the rock (with feeling from my hand) without interrupt them too much?
I'm trying to come up with an algorithm that will allow me to generate a random N-dimensional real-valued vector that's linearly independent with respect to a set of already-generated vectors. I don't want to force them to be orthogonal, only linearly independent. I know Graham-Schmidt exists for the orthogonalization problem, but is there a weaker form that only gives you linearly independent vectors?
Step 1. Generate random vector vr.
Step 2. Copy vr to vo and update as follows: for every already generated vector v in v1, v2... vn, subtract the projection of vo on vi.
The result is a random vector orthogonal to the subspace spanned by v1, v2... vn. If that subspace is a basis, then it is the zero vector, of course :)
The decision of whether the initial vector was linearly independent can be made based on the comparison of the norm of vr to the norm of vo. Non-linearly independent vectors will have a vo-norm which is zero or nearly zero (some numerical precision issues may make it a small nonzero number on the order of a few times epsilon, this can be tuned in an application-dependent way).
vr = random_vector()
vo = vr
for v in (v1, v2, ... vn):
vo = vo - dot( vr, v ) / norm( v )
if norm(vo) < k1 * norm(vr):
# this vector was mostly contained in the spanned subspace
# linearly independent, go ahead and use
Here k1 is a very small number, 1e-8 to 1e-10 perhaps?
You can also go by the angle between vr and the subspace: in that case, calculate it as theta = arcsin(norm(vo) / norm(vr)). Angles substantially different from zero correspond to linearly independent vectors.
A somewhat OTT scheme is to generate a NxN non-singular matrix, and use it's columns (or rows) as the N linearly independent vectors.
To generate a non=singular matrix one could generate it's SVD and multiply up. In more detail:
a/ generate a 'random' NxN orthogonal matrix U
b/ generate a 'random' NxN diagonal matrix S with positive numbers in the diagonal
c/ generate a 'random' NxN orthogonal matrix V
d/ compute
M = U*S*V'
To generate a 'random' orthogonal matrix U, one can use the fact that every orthogonal matrix can be written as a product of Household relectors, that is of matrices of the form
H(v) = I - 2*v*v'/(v'*v)
where v is a non zero random vector.
So one could
initialise U to I
for( i=1..N)
generate a none zero vector v
update: U := H(v)*U
Note that if all these matrix multiplications become burdonesome, one could write a special routine to do the update of U. Applying H(v) to a vector u is O(N):
u -> u - 2*(h'*u)/(h'*h) * h
and so applying H to U can be done in O(N squared) rather than O( N cubed)
One advantage of this scheme is that one has some control over 'how linearly independent' the vectors are. The product of the diagonal elements is (up to sign) the determinant of M, so that if this product is 'very small' the vectors are 'almost' linearly dependent
I know that PCA does not tell you which features of a dataset are the most significant, but which combinations of features keep the most variance.
How could you use the fact that PCA rotates the dataset in such a way that it has the most variance along the first dimension, second most along second, and so on to reduce the dimensionality of the dataset?
I mean, more in depth, How are the first N eigenvectors used to transform the feature vectors into a lower-dimensional representation that keeps most of the variance?
Let X be an N x d matrix where each row X_{n,:} is a vector from the dataset.
Then X'X is the covariance matrix and an eigen decomposition gives X'X=UDU' where U is a d x d matrix of eigenvectors with U'U=I and D is a d x d diagonal matrix of eigenvalues.
The form of the eigendecomposition means that U'X'XU=U'UDU'U=D which means that if you transform your dataset by U then the new dataset, XU, will have a diagonal covariance matrix.
If the eigenvalues are ordered from largest to smallest, this also means that the average squared value of the first transformed feature (given by the expression U_1'X'XU_1=\sum_n (\sum_d U_{1,d} X_{n,d})^2) will be larger that the second, the second larger than the third, etc.
If we order the features of a dataset from largest to smallest average value, then if we just get rid of the features with small average values (and the relative sizes of the large average values are much larger than the small ones), then we haven't lost much information. That is the concept.