Finding the probability that two items are compared. (hints please) - arrays

I'm attempting to solve the following problem (from Prof. Jeff Erikson's notes): Given the algorithm below which takes in an unsorted array A and returns the k-th smallest element in the array (given that Partition does what its name implies via the standard quicksort method given the pivot returned by Random (which is assumed to return a uniformly random integer between 1 and n in linear time) and returns the new index of the pivot), we are to find the exact probability that this algorithm compares the i-th smallest and j-th smallest elements in the input array.
QuickSelect(A[1..n],k):
r <-- Partition(A[1..n],Random(n))
if k < r:
return QuickSelect(A[1..r-1],k)
else if k > r:
return QuickSelect(A[r+1..n],k-r)
else:
return A[k]
Now, I can see that the probability of the first if statement being true is (n-k)/n, the probability of the second block being true is (k-1)/n, and the probability of executing the else statement is 1/n. I also know that (assuming i < j) the probability of i < r < j is (j-i-1)/n which guarantees that the two elements are never compared. On the other hand, if i==r or j==r, then i and j are guaranteed to be compared. The part that really trips me up is what happens if r < i or j < r, because whether or not i and j are compared depends on the value of k (whether or not we are able to recursively call QuickSelect).
Any hints and/or suggestions would be greatly appreciated. This is for homework, so I would rather not have full solutions given to me so that I may actually learn a bit. Thanks in advance!

As it has already been mentioned Monte Carlo method is simple solution for fast (in sense of implementation) approximation.
There is a way to compute exact probability using dynamic programming
Here we will assume that all elements in array are distinct and A[i] < A[j].
Let us denote P(i, j, k, n) for probability of comparison ith and jth elements while selecting k-th in an n-elements array.
Then there is equal probability for r to be any of 1..n and this probability is 1/n. Also note that all this events are non-intersecting and their union forms all the space of events.
Let us look carefully at each possible value of r.
If r = 1..i-1 then i and j fall into the same part and the probability of their comparison is P(i-r, j-r, k-r, n-r) if k > r and 0 otherwise.
If r = i the probability is 1.
If r = i+1..j-1 the probability is 0.
If r = j the probability is 1 and if r = j+1..n the probability is P(i, j, k, r-1) if k < r and 0 otherwise.
So the full recurrent formula is P(i, j, k, n) = 1/n * (2 + Sum for r = 1..min(r, i)-1 P(i-r, j-r, k-r, n-r) + sum for r = max(j, k)+1..n P(i, j, k, r-1))
Finally for n = 2 (for i and j to be different) the only possible Ps are P(1, 2, 1, 2) and P(1, 2, 2, 2) and both equal 1 (no matter what r is equal to there will be a comparison)
Time complexity is O(n^5), space complexity is O(n^4). Also it is possible to optimize calculations and make time complexity O(n^4). Also as we only consider A[i] < A[j] and i,j,k <= n multiplicative constant is 1/8. So it would possible to compute any value for n up to 100 in a couple of minutes, using straight-forward algorithm described or up to 300 for optimized one.

Note that two positions are only compared if one of them is the pivot. So the best way to look at this is to look at the sequence of chosen pivots.
Suppose the k-th smallest element is between i and j. Then i and j are not compared if and only if an element between them is selected as a pivot before i or j are. What is the probability that this happens?
Now suppose the k-th smallest element is after j. i and j are not compared if and only if an element between i+1 and k (excluding j) is selected as a pivot before i or j are. What is the probability that this happens?

Related

Finding period of a sequence

I want to solve this problem:
For a given sequence, a[0], a[1], a[2],..., a[n-1], please find "period" of the sequence.
The period is the minimum integer k (k >= 1) that satisfies a[i] = a[i+k] for all valid i, and also k is a divisor of n.
My current solution is calculating all divisor of n (this is k) and test for all k, but it takes O(n * d(n)). I think it is slow.
Is there any efficient algorithm?
Apply Z-algorithm ( here and here) to given sequence.
Then find the first position i such that
i+z[i] = n
and
n mod i = 0
If such value of i exists, it is the shortest period

Finding the smallest sum of the difference of A[i] and a constant

For an assignment I need to solve a mathmatical problem. I narrowed it down to the following:
Let A[1, ... ,n] be an array of n integers.
Let y be an integer constant.
Now, I have to write an algorithm that finds the minimum of M(y) in O(n) time:
M(y) = Sum |A[i] - y|, i = 1 to n. Note that I not just take A[i] - y, but the absolute value |A[i] - y|.
For clarity, I also put this equation in Wolfram Alpha.
I have considered least squares method, but this will not yield the minimum of M(y) but more of an average value of A, I think. As I'm taking the absolute value of A[i] - y, there is also no way I can differentiate this function to y. Also I can't just come up with any algorithm because I have to do it in O(n) time. Also, I believe there can be more correct answers for y in some cases, in that case, the value of y must be equal to one of the integer elements of A.
This has really been eating me for a whole week now and I still haven't figured it out. Can anyone please teach me the way to go or point me in the right direction? I'm stuck. Thank you so much for your help.
You want to pick a y for which M(y) = sum(abs(A[i] - y)) is minimal. Let's assume every A[i] is positive (it does not change the result, because the problem is invariant by translation).
Let's start with two simple observations. First, if you pick y such that y < min(A) or y > max(A), you end up with a greater value for M(y) than if you picked y such that min(A) <= y <= max(A). Also, there is a unique local minimum or range of minima of A (M(y) is convex).
So we can start by picking some y in the interval [min(A) .. max(A)] and try to move this value around so that we get a smaller M(y). To make things easier to understand, let's sort A and pick a i in [1 .. n] (so y = A[i]).
There are three cases to consider.
If A[i+1] > A[i], and either {n is odd and i < (n+1)/2} or {n is even and i < n/2}, then M(A[i+1]) < M(A[i]).
This is because, going from M(A[i]) to M(A[i+1]), the number of terms that decrease (that is n-i) is greater than the number of terms that increase (that is i), and the increase or decrease is always of the same amount. In the case where n is odd, i < (n+1)/2 <=> 2*i < n+1 <=> 2*i < n, because 2*i is even (thus necessarily smaller than a larger even number from which we subtract one).
In more formal terms, M(A[i]) = sum(A[i]-A[s]) + sum(A[g]-A[i]), where s and g represent indices such that A[s] < A[i] and A[g] > A[i]. So if A[i+1] > A[i], then M(A[i+1]) = sum(A[i]-A[s]) + i*(A[i+1]-A[i]) + sum(A[g]-A[i]) - (n-i)*(A[i+1]-A[i]) = M(A[i]) + (2*i-n)*(A[i+1]-A[i]). Since 2*i < n and A[i+1] > A[i], (2*i-n)*(A[i+1]-A[i]) < 0, so M(A[i+1]) < M(A[i]).
Similarly, if A[i-1] < A[i], and either {n is odd and i > (n+1)/2} or {n is even and i > (n/2)+1}, then M(A[i-1]) > M(A[i]).
Finally, if {n is odd and i = (n+1)/2} or {n is even and i = (n/2) or (n/2)+1}, then you have a minimum, because decrementing or incrementing i will eventually lead you to the first or second case, respectively. There are leftover possible values for i, but all of them lead to A[i] being a minimum too.
The median of A is exactly the value A[i] where i satisfies the last case. If the number of elements in A is odd, then you have exactly one such value, y = A[(n+1)/2] (but possibly multiple indices for it) ; if it's even, then you have a range (which may contain just one integer) of such values, A[n/2] <= y <= A[n/2+1].
There is a standard C++ algorithm that can help you find the median in O(n) time : nth_element. If you are using another language, look up the median of medians algorithm (which Nico Schertler pointed out) or even introselect (which is what nth_element typically uses).

Rearrange array with sum and length n while minimizing distance change

An array A[n] (length n) of nonnegative integers is given. The array contains n objects dispersed through itself, so that A[i] represents the number of objects in slot i, and the sum from i=0 to n of A[i] is n. We must rearrange the array such that each element is 1. We are given that n <= 10^5. (So O(n^2) is too slow, but O(n log n) is fine.)
Rearranging works as follows: if A[i] = k and A[j] = h, then we can decrease A[i] by some positive integer m <= k and increase some element A[j] by m correspondingly, so that A[i] = k-m and A[j] = h+m. However, each rearrange has a cost given by Cost(m, i, j) = m d(i, j)^2 (proportional to the square of the distance of the rearrange). The array works such that the distance function d(i, j) is standard subtraction, but it can wrap around the array, so if n = 7 then d(1, 4)=3, but d(0, 6) = 1 and d(1,5) = 3, etc. That is, we can think of the array as "circular."
The total cost is given by the sum of the cost function over all rearranges, and our goal is to find the minimum value of the cost function such that A[i] = 1 for all i, i.e. all the elements are equal to 1. Each object can only be rearranged once, so for example if A = [5,0,0,0,0] we can't just move an object from A[0] to A[1] and then to A[2] to circumvent the squaring of the distance.
Help on an algorithm or pseudocode/code to solve this problem would be appreciated.

Finding continuous subsequence that minimizes the average of the rest of the array?

Suppose there's an integer array arr[0..n-1]. Find a subsequence sub[i..j] (i > 0 and j < n - 1) such that the rest of the array has the smallest average.
Example:
arr[5] = {5,1,7,8,2};
Remove {7,8}, the array becomes {5, 1, 2} which has average 2.67 (smallest possible).
I thought this is a modification of the Longest Increasing Subsequence but couldn't figure it out.
Thanks,
Let's find the average value using binary search.
Suppose, that sum of all elements is S.
For given x let's check if exist i and j such that avg of all elements except from i to j less or equal to x.
To do that, let's subtract x from all elements in arr. We need to check if exists i and j such that sum of all elements except from i to j less or equal to zero. To do that, lets find sum of all elements in current array: S' = S - x * n. So we want to find i and j such that sum from i to j will be greater or equal than S'. To do that, let's find subarray with the larges sum. And this can be done using elegant Jay Kadane's algorithm: https://en.wikipedia.org/wiki/Maximum_subarray_problem
When to terminate binary search? When the maximum subarray sum will be zero (or close enough).
Time complexity: O(n log w), w - presicion of the binary search.

Finding the maximum element of the array is a divisor of another element

Given an array of non-zero integers of length N. Write a function that returns the maximum element of the array, which is a divisor of some other element of the same array. If this number is not present, then return 0. I know how to solve in O(n^2). Is it possible to do it faster?
First, note that you are assuming that testing if integer A divides integer B can be completed in O(1). I guess you're also assuming that no pre-computation (e.g. building a divisibility graph) is allowed.
Since integer factorization (for which no polynomial algorithm is known) is not an option, you can't do faster then O(n^2) (worst case).
For example, given the input {11,127, 16139} (all integers are primes, each integer squared is less than the next one), you can't avoid checking all pairs.
I have been playing with your problem for a while and found a sometimes-better than brute-force solution.
It is based in to ideas:
We can perform the search in an order such that bigger divisor candidates are tested first. That way we can terminate the search as soon as we find a divisor.
One way to test if some candidate divw is a divisor for number w, is to calculate r = floor(w / divw) and then check that r * divw == w. The interesting thing, is that when it fails, we can calculate a top limit for the next divisor candidate of w as topw = floor(w / (r + 1)). So we can discard anything between divw and topw.
A sample for that second point: Imagine we are testing if divw = 10 is a divisor of w = 12, we calculate r = floor(12 / 10) = 1, and topw = floor(w / 2) = 6. So, we don't need to check if numbers in the set between 7 and 9, inclusive, are divisors for 12.
In order to implement this algorithm I have used a heap to keep the numbers in the set using as key the next divisor candidate that has to be tested.
So...
Initialize the heap pushing every element which its predecessor as its bigger potential divisor.
Pop the first element from the heap (w) and check if the potential divisor candidate (divw) is actually a divisor.
If it is, return it as the biggest divisor
Calculate topw for w, divw; search the next element in the set divw' that is equal or lesser than topw (using binary-search); if found, push w,divw' again in the queue.
unless the queue is empty, goto 2.
An implementation in Common Lisp is available here!
I guess calculating the theoretical computational cost for this algorithm would be challenging, specially for the average case, so I am not going to do it!
After running it a dozen times, it seems to behave better than the brute force approach when N is high and the numbers are dispersed (which means that the probability of one number being a divisor of other is low). On the other hand, brute-force seems to be faster when N is low or when the numbers are densely distributed in a small range (which means that the probability of a number being a divisor of other is high).
I did it so
int f(int* a, int size)
{
int max = 0;
for (int i = 0; i < size; i++)
for (int j = 0; j < size; j++)
if (a[i] > a[j] && a[i] % a[j] == 0 && a[j] > max)
max = a[j];
return max;
}

Resources