Quick select different pivot selection algorithms results - c

I wrote the following quick-select randomize algorithm that moves the smallest k elements of an array to the beginning of it in linear time (technically worst case O(n^2) but the probability drops exponentially):
// This function moves the smallest k elements of the array to
// the beginning of it in time O(n).
void moveKSmallestValuesToTheLeft( double arr[] ,
unsigned int n ,
unsigned int k )
{
int l = 0, r = n - 1; //Begginning and end indices of the array
while (0 < k && k < n && n > 10)
{
unsigned int partition_index, left_size, pivot;
//Partition the data around a random pivot
pivot = generatePivot(arr, l, n, k); //explained later
partition_index = partition(arr, l, r, pivot); //standard quick sort partition
left_size = partition_index - l + 1;
if (k < left_size)
{
//Continue with left subarray
r = partition_index - 1;
n = partition_index - l;
}
else
{
//Continue with right subarray
l += left_size;
n -= left_size;
k -= left_size;
}
}
if (n <= 10)
insertionSort(arr + l, n);
}
And I tested 3 different methods for generating pivot all of them are based on selecting 5 random candidates and returning one of them, for each method I ran the code 100,000. These were the methods:
Choose random 5 elements and return their median
Choose random 5 elements, calculate k/n and check which element of the 5 is closest to it. I.e, if k/n <= 1/5 return the min if k/n <= 2/5 return the second smallest value, if k/n <= 3/5 return the median and so on.
Exactly the same as method 2 but we give more weight to the pivots closer to the median based on the binomial coefficients of them, i.e. I calculated the binomial coefficients for n=5-1 and got [1 4 6 4 1] then I normalized them and calculated their cum-sum and got [0.0625 0.3125, 0.6875 0.9375 1] and then I did: If k/n <= 0.0625 return the min, if k/n <= 0.3125 return the second smallest value, if k/n <= 0.6875 return the median and so on...
My intuition told me that method 2 would perform the best because it always chooses the pivot that would most likely be closest to the k'th smallest element and therefore would probably decrease k or n the most at each iteration, but instead every time I ran the code I got the following results (ranked fastest method to slowest method based an average and worst case times):
Average running time:
First place (fastest): Method 3
Second place: Method 2
Last place: Method 1
Worst case running time:
First place (fastest): Method 1
Second place: Method 3
Last place: Method 2
My question is is there any mathematical way to explain these results or at least give some intuition to them? Because my intuition was completely wrong method 2 didn't outperform neither of the other 2 methods.
EDIT
So apparently the problem was that I only tested k=n/2 which is an edge case so I got this weird results.

Related

Data Structure Question on Arrays - How to find the best of array given conditions

I am new and learning Data structure and algorithm, I need help to solve this question
The best of an array having N elements is defined as sum of best of all elements of Array. The best of element A[i] is defined in the following manner
a: The best of element A[i] is 1 if, A[i-1]<A[i]<A[i+1]
b: The best of element A[i] is 2 if, A[i]> A[j] for j ranging from 0 to n-1
and A[i]<A[h] for h ranging from i+1 to N-1
Write program to find best of array
Note- A[0] and A[N-1] are excluded to find best of array, all elements are unique
Input - 2,1,3,9,20,7,8
Output - 3
The best of element 3 is 2 and 9 is 1. For rest element it is 0. Hence 2+1 =3
This is what I tried so far -
public static void main (String [] args) {
int [] A = {2,1,3,9,20,7,8};
int result = 0;
for(int i=1; i<A.length-2; i++) {
if(A[i-1] < A[i] && A[i]< A[i+1] ) {
result += 1;
}else if(A[i]>A[j] && A[i]<A[h]){
result +=2;
}else {
result+=0;
}
}
}
Note how the phrase:
A[i]> A[j] for j ranging from 0 to n-1
simply means: If the current element is not the Minimum of the array. Hence, if you find the minimum at the beginning, this condition can be changed into a much simpler and lightweight condition:
Let m be the minimum of the array, then if A[i] > m
So you don't need to do a linear search every iteration --> Less time complexity.
Now you have the problem with a complexity of O(N^2), ..which can be reduced further.
Regarding
and A[i]<A[h] for h ranging from i+1 to N-1
Get the maximum element from 2 to N-1. Then at every iteration, check if the current element is less than the maximum. If so, consider it while composing the score, otherwise, that means the current element is the maximum, in this case, re-calculate the maximum element from i+1 to N-1.
The worst case scenario is to find the maximum is always at index i where the array is already sorted in descending order.
Whereas the best case scenario is if the maximum is always the last element, hence the overall complexity is reduced to O(N).
Regarding
A[i-1]<A[i]<A[i+1]
This is straightforward, you simply compare the elements reside at those three indices at every iteration.
Implementation
Before anything, the following are important notes:
The result you've got in your example isn't correct as elements 3 and 9 both fulfill both conditions, so each should score either 1 or 2, but cannot be one with score of 1 and another with score of 2. Hence the overall score should be either 1+1 = 2 or 2 + 2 = 4.
I implemented this algorithm in Java (although I prefer Python), as I could guess it from your code snippet.
import java.util.Arrays;
public class ArrayBest {
private static int[] findMinMax(Integer [] B) {
// find minimum and the maximum: Time Complexity O(n log(n))
Integer[] b = Arrays.copyOf(B, B.length);
Arrays.sort(b);
return new int []{b[0], b[B.length-1]};
}
public static int find(Integer [] A) {
// Exclude the first and last elements
int N = A.length;
Integer [] B = Arrays.copyOfRange(A, 1, N-1);
N -= 2;
// find minimum and the maximum: Time Complexity O(n log(n))
// min at index 0, and max at index 1
int [] minmax = findMinMax(B);
int result = 0;
// start the search
for (int i=0; i<N-1; i++) {
// start with first condition : the easier
if (i!=0 && B[i-1]<B[i] && B[i]<B[i+1]) {
result += 1;
}else if (B[i] != minmax[0]) { // Equivalent to A[i]> A[j] : j in [0, N-1]
if (B[i] < minmax[1]) { // if it is less than the maximum
result += 2;
}else { // it is the maximum --> re-calculate the max over the range (i+1, N)
int [] minmax_ = findMinMax(Arrays.copyOfRange(B, i+1, N));
minmax[1] = minmax_[1];
}
}
}
return result;
}
public static void main(String[] args) {
Integer [] A = {2,1,3,9,7,20,8};
int res = ArrayBest.find(A);
System.out.println(res);
}
}
Ignoring the first sort, the best case scenario is when the last element is the maximum (i.e, at index N-1), hence time complexity is O(N).
The worst case scenario, is when the array is already sorted in a descending order, so the current element that is being processed is always the maximum, hence at each iteration the maximum should be found again. Consequently, the time complexity is O(N^2).
The average case scenario depends on the probability of how the elements are distributed in the array. In other words, the probability that the element being processed at the current iteration is the maximum.
Although it requires more study, my initial guess is as follows:
The probability of any i.i.d element to be the maximum is simply 1/N, and that is at the very beginning, but as we are searching over (i+1, N-1), N will be decreasing, hence the probability will go like: 1/N, 1/(N-1), 1/(N-2), ..., 1. Counting the outer loop, we can write the average complexity as O(N (1/N + 1/(N-1), 1/(N-2), + ... +1)) = O(N (1 + 1/2 + 1/3 + ... + 1/N)) where its asymptotic upper bound (according to Harmonic series) is approximately O(N log(N)).

Given a sorted array and a positive integer k, find the number of integer in the interval(100(i-1)/k, 100(i)/k] for 1 <= i <= k

Given a sorted array A[1..n] and a positive integer k, count the number of integer in the intervals(100(i-1)/k, 100(i)/k] for 1 <= i <= k and store it in another array G[1..k]
Assume array G is already created(is not an input in the algorithm )
and element in G is initialized to be 0.
Also, there is a helper function Increase(i, count) to find the interval(G[?]) of A[i] correspond to and increase the value of G[?] by count;
For example, a sorted array [1,11,25,34,46,62,78,90,99] and k = 4
so the result should be G[1] = 3, G[2] = 2, G[3] = 1, G[4] = 3
where G[1] is an interval (0,25] G[2] -> (25,50] G[3] -> (50,75] G[4] -> (75,100]
Is there any divide-and-conquer algorithm to solve this problem? rather than solve it linearly?
More advanced:
Also, If we cannot directly access the element in array A , and there is a function Compare(x, y) to return true if A[x] and A[y] is in the same interval.
How to solve it? Can I try to make each group call at most log n time Increase and there are k groups hence having O(k log n ) running time?
My observation at this point:
if A[i] and A[y] is in the same interval where i < y, element A[j] with i < j < y will also in the same interval.
The easiest sublinear approach (assuming k << n) is to do (k+1) binary searches, one for each boundary value, yielding an approximately (k lg n)-comparison algorithm.
This can be lowered to approximately (k (1 + lg (n/k))) by combining probes together intelligently.

Magical array A of N integers with K length

Given an array A of N integers, An array called magical if its all the elements have exactly 3 divisors. Now you have to convert the given array into the magical array of K length. You can perform the following operations in any order of time.
Increase the value of any element of the array by 1.
Decrease the value of any element of the array by 1.
Delete any element of the array.
Constraints:
1 <= N <= 1000000
1 <= K <= N
1 <= A <= 1000000
Sample Input
5(size of the array) 3(K)
1 4 10 8 15
Output
4
A solution I tried:
Iterated every element of the array, checking near a prime number square and adding this difference to global count operation(variable used to count required operations). This time-order is n^2.
Searching for a better solution.
Make an array with absolute values of differences with closest prime squares
Use QuickSelect algorithm to separate K smaller differences (average complexity tends to O(N), while the worst quadratic case is possible)
Calculate their sum
you can try with below method to find number with 3 divisors
void numbersWith3Divisors(int n)
{
boolean[] isPrime = new boolean[n+1];
Arrays.fill(isPrime, true);
isPrime[0] = isPrime[1] = false;
for (int p=2; p*p<=n; p++)
{
if (isPrime[p] == true)
{
for (int i=p*2; i<=n; i += p)
isPrime[i] = false;
}
}
System.out.print("Numbers with 3 divisors :- ");
for (int i=0; i*i <= n ; i++)
if (isPrime[i])
System.out.print(i*i + " ");
}
the same you can apply for array,
hope it will help

Total number of possible triangles from n numbers

If n numbers are given, how would I find the total number of possible triangles? Is there any method that does this in less than O(n^3) time?
I am considering a+b>c, b+c>a and a+c>b conditions for being a triangle.
Assume there is no equal numbers in given n and it's allowed to use one number more than once. For example, we given a numbers {1,2,3}, so we can create 7 triangles:
1 1 1
1 2 2
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3
If any of those assumptions isn't true, it's easy to modify algorithm.
Here I present algorithm which takes O(n^2) time in worst case:
Sort numbers (ascending order).
We will take triples ai <= aj <= ak, such that i <= j <= k.
For each i, j you need to find largest k that satisfy ak <= ai + aj. Then all triples (ai,aj,al) j <= l <= k is triangle (because ak >= aj >= ai we can only violate ak < a i+ aj).
Consider two pairs (i, j1) and (i, j2) j1 <= j2. It's easy to see that k2 (found on step 2 for (i, j2)) >= k1 (found one step 2 for (i, j1)). It means that if you iterate for j, and you only need to check numbers starting from previous k. So it gives you O(n) time complexity for each particular i, which implies O(n^2) for whole algorithm.
C++ source code:
int Solve(int* a, int n)
{
int answer = 0;
std::sort(a, a + n);
for (int i = 0; i < n; ++i)
{
int k = i;
for (int j = i; j < n; ++j)
{
while (n > k && a[i] + a[j] > a[k])
++k;
answer += k - j;
}
}
return answer;
}
Update for downvoters:
This definitely is O(n^2)! Please read carefully "An Introduction of Algorithms" by Thomas H. Cormen chapter about Amortized Analysis (17.2 in second edition).
Finding complexity by counting nested loops is completely wrong sometimes.
Here I try to explain it as simple as I could. Let's fix i variable. Then for that i we must iterate j from i to n (it means O(n) operation) and internal while loop iterate k from i to n (it also means O(n) operation). Note: I don't start while loop from the beginning for each j. We also need to do it for each i from 0 to n. So it gives us n * (O(n) + O(n)) = O(n^2).
There is a simple algorithm in O(n^2*logn).
Assume you want all triangles as triples (a, b, c) where a <= b <= c.
There are 3 triangle inequalities but only a + b > c suffices (others then hold trivially).
And now:
Sort the sequence in O(n * logn), e.g. by merge-sort.
For each pair (a, b), a <= b the remaining value c needs to be at least b and less than a + b.
So you need to count the number of items in the interval [b, a+b).
This can be simply done by binary-searching a+b (O(logn)) and counting the number of items in [b,a+b) for every possibility which is b-a.
All together O(n * logn + n^2 * logn) which is O(n^2 * logn). Hope this helps.
If you use a binary sort, that's O(n-log(n)), right? Keep your binary tree handy, and for each pair (a,b) where a b and c < (a+b).
Let a, b and c be three sides. The below condition must hold for a triangle (Sum of two sides is greater than the third side)
i) a + b > c
ii) b + c > a
iii) a + c > b
Following are steps to count triangle.
Sort the array in non-decreasing order.
Initialize two pointers ‘i’ and ‘j’ to first and second elements respectively, and initialize count of triangles as 0.
Fix ‘i’ and ‘j’ and find the rightmost index ‘k’ (or largest ‘arr[k]‘) such that ‘arr[i] + arr[j] > arr[k]‘. The number of triangles that can be formed with ‘arr[i]‘ and ‘arr[j]‘ as two sides is ‘k – j’. Add ‘k – j’ to count of triangles.
Let us consider ‘arr[i]‘ as ‘a’, ‘arr[j]‘ as b and all elements between ‘arr[j+1]‘ and ‘arr[k]‘ as ‘c’. The above mentioned conditions (ii) and (iii) are satisfied because ‘arr[i] < arr[j] < arr[k]'. And we check for condition (i) when we pick 'k'
4.Increment ‘j’ to fix the second element again.
Note that in step 3, we can use the previous value of ‘k’. The reason is simple, if we know that the value of ‘arr[i] + arr[j-1]‘ is greater than ‘arr[k]‘, then we can say ‘arr[i] + arr[j]‘ will also be greater than ‘arr[k]‘, because the array is sorted in increasing order.
5.If ‘j’ has reached end, then increment ‘i’. Initialize ‘j’ as ‘i + 1′, ‘k’ as ‘i+2′ and repeat the steps 3 and 4.
Time Complexity: O(n^2).
The time complexity looks more because of 3 nested loops. If we take a closer look at the algorithm, we observe that k is initialized only once in the outermost loop. The innermost loop executes at most O(n) time for every iteration of outer most loop, because k starts from i+2 and goes upto n for all values of j. Therefore, the time complexity is O(n^2).
I have worked out an algorithm that runs in O(n^2 lgn) time. I think its correct...
The code is wtitten in C++...
int Search_Closest(A,p,q,n) /*Returns the index of the element closest to n in array
A[p..q]*/
{
if(p<q)
{
int r = (p+q)/2;
if(n==A[r])
return r;
if(p==r)
return r;
if(n<A[r])
Search_Closest(A,p,r,n);
else
Search_Closest(A,r,q,n);
}
else
return p;
}
int no_of_triangles(A,p,q) /*Returns the no of triangles possible in A[p..q]*/
{
int sum = 0;
Quicksort(A,p,q); //Sorts the array A[p..q] in O(nlgn) expected case time
for(int i=p;i<=q;i++)
for(int j =i+1;j<=q;j++)
{
int c = A[i]+A[j];
int k = Search_Closest(A,j,q,c);
/* no of triangles formed with A[i] and A[j] as two sides is (k+1)-2 if A[k] is small or equal to c else its (k+1)-3. As index starts from zero we need to add 1 to the value*/
if(A[k]>c)
sum+=k-2;
else
sum+=k-1;
}
return sum;
}
Hope it helps........
possible answer
Although we can use binary search to find the value of 'k' hence improve time complexity!
N0,N1,N2,...Nn-1
sort
X0,X1,X2,...Xn-1 as X0>=X1>=X2>=...>=Xn-1
choice X0(to Xn-3) and choice form rest two item x1...
choice case of (X0,X1,X2)
check(X0<X1+X2)
OK is find and continue
NG is skip choice rest
It seems there is no algorithm better than O(n^3). In the worst case, the result set itself has O(n^3) elements.
For Example, if n equal numbers are given, the algorithm has to return n*(n-1)*(n-2) results.

Given 2 sorted arrays of integers, find the nth largest number in sublinear time [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to find the kth smallest element in the union of two sorted arrays?
This is a question one of my friends told me he was asked while interviewing, I've been thinking about a solution.
Sublinear time implies logarithmic to me, so perhaps some kind of divide and conquer method. For simplicity, let's say both arrays are the same size and that all elements are unique
I think this is two concurrent binary searches on the subarrays A[0..n-1] and B[0..n-1], which is O(log n).
Given sorted arrays, you know that the nth largest will appear somewhere before or at A[n-1] if it is in array A, or B[n-1] if it is in array B
Consider item at index a in A and item at index b in B.
Perform binary search as follows (pretty rough pseudocode, not taking in account 'one-off' problems):
If a + b > n, then reduce the search set
if A[a] > B[b] then b = b / 2, else a = a / 2
If a + b < n, then increase the search set
if A[a] > B[b] then b = 3/2 * b, else a = 3/2 * a (halfway between a and previous a)
If a + b = n then the nth largest is max(A[a], B[b])
I believe worst case O(ln n), but in any case definitely sublinear.
I believe that you can solve this problem using a variant on binary search. The intuition behind this algorithm is as follows. Let the two arrays be A and B and let's assume for the sake of simplicity that they're the same size (this isn't necessary, as you'll see). For each array, we can construct parallel arrays Ac and Bc such that for each index i, Ac[i] is the number of elements in the two arrays that are no larger than A[i] and Bc[i] is the number of elements in the two arrays that are no larger than B[i]. If we could construct these arrays efficiently, then we could find the kth smallest element efficiently by doing binary searches on both Ac and Bc to find the value k. The corresponding entry of A or B for that entry is then the kth largest element. The binary search is valid because the two arrays Ac and Bc are sorted, which I think you can convince yourself of pretty easily.
Of course, this solution doesn't work in sublinear time because it takes O(n) time to construct the arrays Ac and Bc. The question then is - is there some way that we can implicitly construct these arrays? That is, can we determine the values in these arrays without necessarily constructing each element? I think that the answer is yes, using this algorithm. Let's begin by searching array A to see if it has the kth smallest value. We know for a fact that the kth smallest value can't appear in the array in array A after position k (assuming all the elements are distinct). So let's focus just on the the first k elements of array A. We'll do a binary search over these values as follows. Start at position k/2; this is the k/2th smallest element in array A. Now do a binary search in array B to find the largest value in B smaller than this value and look at its position in the array; this is the number of elements in B smaller than the current value. If we add up the position of the elements in A and B, we have the total number of elements in the two arrays smaller than the current element. If this is exactly k, we're done. If this is less than k, then we recurse in the upper half of the first k elements of A, and if this is greater than k we recurse in the lower half of the first elements of k, etc. Eventually, we'll either find that the kth largest element is in array A, in which case we're done. Otherwise, repeat this process on array B.
The runtime for this algorithm is as follows. The search of array A does a binary search over k elements, which takes O(lg k) iterations. Each iteration costs O(lg n), since we have to do a binary search in B. This means that the total time for this search is O(lg k lg n). The time to do this in array B is the same, so the net runtime for the algorithm is O(lg k lg n) = O(lg2 n) = o(n), which is sublinear.
This is quite similar answer to Kirk's.
Let Find( nth, A, B ) be function that returns nth number, and |A| + |B| >= n. This is simple pseudo code without checking if one of array is small, less than 3 elements. In case of small array one or 2 binary searches in larger array is enough to find needed element.
Find( nth, A, B )
If A.last() <= B.first():
return B[nth - A.size()]
If B.last() <= A.first():
return A[nth - B.size()]
Let a and b indexes of middle elements of A and B
Assume that A[a] <= B[b] (if not swap arrays)
if nth <= a + b:
return Find( nth, A, B.first_half(b) )
return Find( nth - a, A.second_half(a), B )
It is log(|A|) + log(|B|), and because input arrays can be made to have n elements each it is log(n) complexity.
int[] a = new int[] { 11, 9, 7, 5, 3 };
int[] b = new int[] { 12, 10, 8, 6, 4 };
int n = 7;
int result = 0;
if (n > (a.Length + b.Length))
throw new Exception("n is greater than a.Length + b.Length");
else if (n < (a.Length + b.Length) / 2)
{
int ai = 0;
int bi = 0;
for (int i = n; i > 0; i--)
{
// find the highest from a or b
if (ai < a.Length)
{
if (bi < b.Length)
{
if (a[ai] > b[bi])
{
result = a[ai];
ai++;
}
else
{
result = b[bi];
bi++;
}
}
else
{
result = a[ai];
ai++;
}
}
else
{
if (bi < b.Length)
{
result = b[bi];
bi++;
}
else
{
// error, n is greater than a.Length + b.Length
}
}
}
}
else
{
// go in reverse
int ai = a.Length - 1;
int bi = b.Length - 1;
for (int i = a.Length + b.Length - n; i >= 0; i--)
{
// find the lowset from a or b
if (ai >= 0)
{
if (bi >= 0)
{
if (a[ai] < b[bi])
{
result = a[ai];
ai--;
}
else
{
result = b[bi];
bi--;
}
}
else
{
result = a[ai];
ai--;
}
}
else
{
if (bi >= 0)
{
result = b[bi];
bi--;
}
else
{
// error, n is greater than a.Length + b.Length
}
}
}
}
Console.WriteLine("{0} th highest = {1}", n, result);
Sublinear of what though? You can't have an algorithm that doesn't check at least n elements, even verifying a solution would require checking that many. But the size of the problem here should surely mean the size of the arrays, so an algorithm that only checks n elements is sublinear.
So I think there's no trick here, start with the list with the smaller starting element and advance until you either:
Reach the nth element, and you're done.
Find the next element is bigger than the next element in the other list, at which point you switch to the other list.
Run out of elements and switch.

Resources