In the chapter about Value Iteration algorithm to calculate optimal policy for MDPs, there is an algorithm:
function Value-Iteration(mdp,ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s'|s,a),
rewards R(s), discount γ
ε, the maximum error allowed in the utility of any state
local variables: U, U', vectors of utilities for states in S, initially zero
δ, the maximum change in the utility of any state in an iteration
repeat
U ← U'; δ ← 0
for each state s in S do
U'[s] ← R(s) + γ max(a in A(s)) ∑ over s' (P(s'|s,a) U[s'])
if |U'[s] - U[s]| > δ then δ ← |U'[s] - U[s]|
until δ < ε(1-γ)/γ
return U
(I apologize for the formatting, but I need 10 rep to post picture and $latex formatting$ doesn't seem to work here.)
and also a chapter earlier there was a statement:
A discount factor of γ is equivalent to an interest rate of (1/γ) − 1.
Could anyone explain to me what does the interest rate (1/γ)-1 mean? How did they get it? Why is it used in the termination condition in the algorithm above?
The reward at t-1 is considered discounted by a factor gamma (y). That is to say, old = y x new. So new = (1/y) * old, and new - old = ((1/y) - 1) * old. That is your interest rate.
I am not so sure why it is used in the termination condition. The value of epsilon is arbitrary, anyway.
In fact, I believe this termination criterion is very bad. It does not work when y = 1. When y = 0, then the iteration should stop immediately, since it is enough to estimate perfect values. When y = 1, many iterations are necessary.
Related
I have a very large NoSQL database. Each item in the database is assigned a uniformly distributed random value between 0 and 1. This database is so large that performing a COUNT on queries does not yield acceptable performance, but I'd like to use the random values to estimate COUNT.
The idea is this:
Run a query and order the query by the random value. Random values are indexed, so it's fast.
Grab the lowest N values, and see how big the largest value is, say R.
Estimate COUNT as N / R
The question is two-fold:
Is N / R the best way to estimate COUNT? Maybe it should be (N+1)/R? Maybe we could look at the other values (average, variance, etc), and not just the largest value to get a better estimate?
What is the error margin on this estimated value of COUNT?
Note: I thought about posting this in the math stack exchange, but given this is for databases, I thought it would be more appropriate here.
This actually would be better on math or statistics stack exchange.
The reasonable estimate is that if R is large and x is your order statistic, then R is approximately n / x - 1. About 95% of the time the error will be within 2 R / sqrt(n) of this. So looking at the 100th element will estimate the right answer to within about 20%. Looking at the 10,000th element will estimate it to within about 2%. And the millionth element will get you the right answer to within about 0.2%.
To see this, start with the fact that the n'th order statistic has a Beta distribution with parameters 𝛼 = n and β = R + 1 - n. Which means that the mean value of the n'th smallest value out of R values is n/(R+1). And its variance is 𝛼β / ((𝛼 + β)^2 (𝛼 + β + 1)). If we assume that R is much larger than n, then this is approximately n R / R^3 = n / R^2. Which means that our standard deviation is sqrt(n) / R.
If x is our order statistic, this means that (n / x) - 1 is a reasonable estimate of R. And how much is it off by? Well, we can use the tangent line approximation. The function (n / x) - 1 has a derivative of - n / x^2 Its derivative at x = n/(R+1) is therefore (R + 1)^2 / n. Which for large R is roughly R^2 / n. Stick in our standard deviation of sqrt(n) / R and we come up with an error proportional to R / sqrt(n). Since a 95% confidence interval would be 2 standard deviations, you probably will have an error of around 2 R / sqrt(n).
By definition, the gate 1/sqrt(5) (I + 2iZ) should act on a qubit a|0> + b|1> to transform it into 1/sqrt(5) ((1+2i)a|0> + (1-2i)b|1>) but transformations of each RUS step does the following-
The ancillas are in |+> state at first
Starting form: 1/sqrt(2) (a,b,a,b,a,b,a,b)
CCNOT(ancillas, input): 1/sqrt(2) (a,b,a,b,a,b,b,a)
S(input): 1/sqrt(2) (a,ib,a,ib,a,ib,b,ia)
CCNOT(ancillas, input): 1/sqrt(2) (a,ib,a,ib,a,ib,ia,b)
Z(input) : 1/sqrt(2) (a,-ib,a,-ib,a,-ib,ia,-b)
Now measuring the ancillas in PauliX basis is equivalent to PauliZ measurement after applying H() to the state. Now I have 2 confusions, should I apply H x H x I or H x H x H to the combined state. Also neither of these transformations turn out to be equivalent to the V-gate defined in the first paragraph when both measurements are Zero. Where did I go wrong?
Reference: https://github.com/microsoft/Quantum/blob/master/samples/diagnostics/unit-testing/RepeatUntilSuccessCircuits.qs (1st sample code)
The transformation is correct, though it takes some time with pen and paper to verify it.
As a side note, we start with a state |+>|+>(a|0> + b|1>), which is 0.5 (a,b,a,b,a,b,a,b) in vector form (both |+> states contribute a 1/sqrt(2) to the coefficients). It will not affect our calculations of the state after the measurement, since it will have to be renormalized, but it's still worth noting.
After a sequence of CCNOT, S, CCNOT, Z we get 0.5 (a,-ib,a,-ib,a,-ib,ia,-b). Since we're measuring only the first two qubits in PauliX basis, we need to apply Hadamards only to the first two qubits, or H x H x I to the combined state.
I'll take the liberty to skip writing out the whole expression after applying Hadamards and fast-forward to the results of measurements, and here is why. We're only interested in the state of the input qubit if both measurements yielded 0, so it's sufficient to gather only the terms of the overall state which have |00> as the state of the first two qubits.
The state of the third qubit after measuring |00> on the first qubit will be: (3+i)a |0> - (3i+1)b |1>, multiplied by some normalization coefficient c.
c = 1/sqrt(|3+i|^2 + |3i+1|^2) = 1/sqrt(10)).
Now we need to check whether the state we got, |S_actual> = 1/sqrt(10) ((3+i)a |0> - (3i+1)b |1>)
is the same state as we'd expect to get from applying the V gate,
|S_expected> = 1/sqrt(5) ((1+2i)a |0> + (1-2i)b |1>). They do not look the same, but remember that in quantum computing the states are defined up to a global phase. Thus, if we can find a complex number p with an absolute value 1 for which |S_actual> = p * |S_expected>, the states will be effectively the same.
This translates into the following equations for p and amplitudes of |0> and |1>: (3+i)/sqrt(2) = p (1+2i) and -(3i+1)/sqrt(2) = p (1-2i). We solve both equations to get p = (1-i)/sqrt(2) which has indeed the absolute value 1.
Thus, we can conclude that indeed the state we got after all the transformations is indeed equivalent to the state we'd get by applying a V gate.
I have been trying this question on hackerearth practice which requires below work done.
PROBLEM
Given an integer n which signifies a sequence of n numbers from {0,1,2,3,4,5.......n-2,n-1}
We are provided m ranges in form of (L,R) such that (0<=L<=n-1)(0<=R<=n-1)
if(L <= R) (L,R) signifies numbers {L,L+1,L+2,L+3.......R-1,R} from above sequence
else (L,R) signifies numbers {R,R+1,R+2,.......n-2,n-1} & {0,1,2,3....L-1,L} ie numbers wrap around
example
n = 5 ie {0,1,2,3,4}
(0,3) signifies {0,1,2,3}
(3,0) signifies {3,4,0}
(3,2) signifies {3,4,0,1,2}
Now we have to select ONE (only one) number from each range without repeating any selection. We have to tell is it possible to select one number from each(and every) range without repetition.
Example test case
n = 5// numbers {0,1,2,3,4}
// ranges m in number //
0 0 ie {0}
1 2 ie {1,2}
2 3 ie {2,3}
4 4 ie {4}
4 0 ie {4,0}
Answer is "NO" it's not possible.
Because we cannot select any number from range 4 0 because if we select 4 from it we could not be able to select from range 4 4 and if select 0 from it we would not be able to select from 0 0
My approaches -
1) it can be done in O(N*M) using recurrsion checking all possibilitie of selection from each range and side by side using hash map to record our selections.
2) I was trying it in order n or m ie linear order .Problem lack editorial explanation .Only a code is mentioned in the editorial without comments and explanation . I m not able to get the codelinear solution code by someone which passes all test cases and got accepted.
I am not able to understand the logic/algo used in the code and why is it working?
Please suggest ANY linear method and logic behind it because problem has these constraints
1 <= N<= 10^9
1 <= M <= 10^5
0 <= L, R < N
which demands a linear or nlogn solution as i guess??
The code in the editorial can also be seen here http://ideone.com/5Xb6xw
Warning --After looking The code I found the code is using n and m interchangebly So i would like to mention the input format for the problem.
INPUT FORMAT
The first line contains test cases, tc, followed by two integers N,M- the first one depicting the number of countries on the globe, the second one depicting the number of ranges his girlfriend has given him. After which, the next M lines will have two integers describing the range, X and Y. If (X <= Y), then range covers countries [X,X+1... Y] else range covers [X,X+1,.... N-1,0,1..., Y].
Output Format
Print "YES" if it is possible to do so, print "NO", if it is not.
There are two components to the editorial solution.
Linear-time reduction to a problem on ordinary intervals
Assume to avoid trivial cases that the number of input intervals is less than n.
The first is to reduce the problem to one where the intervals don't wrap around as follows. Given an interval [L, R], if L ≤ R, then emit two intervals [L, R] and [L + n, R + n]; if L > R, emit [L, R + n]. The easy direction of the reduction is showing that, if the original problem has a solution, then the reduced problem has a solution. For [L, R] with L ≤ R assigned a number k, assign k to [L, R] and k + n to [L + n, R + n]. For [L, R] with L > R, assign whichever of k, k + n belongs to [L, R + n]. Except for the dual assignment of k and k + n for intervals [L, R] and [L + n, R + n] respectively, each interval gets its own residue class mod n, so the assignments do not conflict.
Conversely, the hard direction of the reduction (if the original problem has no solution, then the reduced problem has no solution) is proved using Hall's marriage theorem. By Hall's criterion, an unsolvable original problem has, for some k, a set of k input intervals whose union has size less than k. We argue first that there exists such a set of input intervals whose union is a (circular) interval (which by assumption isn't all of 0..n-1). Decompose the union into the set of maximal (circular) intervals that comprise it. Each input interval is contained in exactly one of these intervals. By an averaging argument, some maximal (circular) interval contains more input intervals than its size. We finish by "lifting" this counterexample to the reduced problem. Given the maximal (circular) interval [L*, R*], we lift it to the ordinary interval [L*, R*] if L* ≤ R*, or [L*, R* + n] if L* > R*. Do likewise with the circular intervals contained in this interval. It is tedious but straightforward to show that this lifted counterexample satisfies Hall's criterion, which implies that the reduced problem has no solution.
O(m log m)-time solution for ordinary intervals
This is a sweep-line algorithm. Sort the intervals by lower endpoint and scan them in that order. We imagine that the sweep line moves from lower endpoint to lower endpoint. Maintain the set of intervals that intersect the sweep line and have not been assigned a number, sorted by upper endpoint. When the sweep line is about to move, assign the numbers between the old and new positions to the intervals in the set, preferentially to the ones whose upper endpoint is the lowest. The correctness of this strategy should be clear: the intervals that could be assigned a number but are passed over have at least as many options (in the sense of being a superset) as the intervals that are assigned, so we never make a choice that we have cause to regret.
Problem
I have to code AI to find mass of a spaceship in a game.
My AI can exert a little force c to the spaceship, to measure the mass via change of velocity.
However, my AI can access only current position of spaceship ,x, in every time-step.
Mass is not constant, but it is safe to assume that it will not change too fast.
For simplicity :-
Let the space be 1D, and has no gravity.
Timestep is always 1 second.
Forces
There are many forces that exert on the spaceship currently, e.g. gravity, an automatic propulsion system controlled by an unknown AI, collision impulse, etc.
The summation of these forces is b, which depends on t (time).
Acceleration a for a certain timestep is calculated by a game-play formula which is out of my control:-
a = (b+c)/m ................. (1)
The velocity v is updated as:-
v = vOld + a ................. (2)
The position x is updated as:-
x = xOld + v ................. (3)
The order of execution (1)-(3) is also unknown, i.e. AI should not rely on such order.
My poor solution
I will exert c0=0.001 for a few second and compare result against when I exert c1=-0.001.
I would assume that b and m are constant for the time period.
I calculate acceleration via :-
t 0 1 2 3 (Exert force `c0` at `t1`, `c1` at `t2`)
x 0 1 2 3 (The number are points in timeline that I sampling x.)
v 0 1 2 (v0=x1-x0, v1=x2-x1, ... )
a 0 1 (a0=v1-v0, ... )
Now I know acceleration of 2 points of timeline, and I can cache c because I am the one who exert it.
With a = (b+c)/m, with unknown b and m and known a0,a1,c0 and c1:-
a0 = (b+c0)/m
a1 = (b+c1)/m
I can solve them to find b and m.
However, my assumption is wrong at the beginning.
b and m are actually not constants.
This problem might be viewed in a more casual way :-
Many persons are trying to lift a heavy rock.
I am one of them.
How can I measure the mass of the rock (with feeling from my hand) without interrupt them too much?
I was asked this question in interview.
Given a list of 'N' coins, their values being in an array A[], return the minimum number of coins required to sum to 'S' (you can use as many coins you want). If it's not possible to sum to 'S', return -1
Note here i can use same coins multiple times.
Example:
Input #00:
Coin denominations: { 1,3,5 }
Required sum (S): 11
Output #00:
3
Explanation:
The minimum number of coins requires is: 3 - 5 + 5 + 1 = 11;
Is there any better way we can think except Sorting the array and start it by both ends?
This is the change-making problem.
A simple greedy approach, which you seem to be thinking of, won't always produce an optimal result. If you elaborate a bit on what exactly you mean by starting from both ends, I might be able to come up with a counter-example.
It has a dynamic programming approach, taken from here:
Let C[m] be the minimum number of coins of denominations d1,d2,...,dk needed to make change for m amount. In the optimal solution to making change for m amount, there must exist some first coin di, where di < m. Furthermore, the remaining coins in the solution must themselves be the optimal solution to making change for m - di.
Thus, if di is the first coin in the optimal solution to making change for m amount, then C[m] = 1 + C[m - di], i.e. one di coin plus C[m - di] coins to optimally make change for m - di amount. We don't know which coin di is the first coin; however, we may check all n such possibilities (subject to the constraint that di < m), and the value of the optimal solution must correspond to the minimum value of 1 + C[m - di], by definition.
Furthermore, when making change for 0, the value of the optimal solution is clearly 0 coins. We thus have the following recurrence.
C[p] = 0 if p = 0
min(i: di < p) {1 + C[p - di]} if p > 0
Pathfinding algorithms (Dijkstra, A*, meeting on the middle, etc.) could be suitable for this on graph like this:
0
1/|\5
/ |3\
/ | \
1 3 5
1/|\51/| ...
/ |3\/ |3
/ | /\ |
2 4 6
....
Other way is recursive bisection. Say, if we cannot get the sum S with one coin, we start to try to get amounts (S/2, S/2)...(S-1,1) recursively until we find suitable coin or reach S=1.