correctness of fast small order statistic algorithm for odd-length array - arrays

Problem 9-3 of the textbook Intro to Algorithms (CLRS) describes a fast O(n) algorithm for finding the k-th order statistic (k-th element in the array when sorted) of a length-n array, for the particular case that k is much smaller than n. I am not certain about the correctness of this algorithm when n is odd, and want to see a way to prove that it is correct.
The basic idea is that we first split the array into two halves, the first with floor(n/2) elements, and the second with ceil(n/2) elements. Then, we "partner" each element in the first half with the corresponding element in the second half. When n is odd this leaves a remaining unpartnered element.
For each pair of partners, we make sure that the left partner is >= the right partner, swapping the two if not. Then, recursively find the k-th order statistic of the second half, mirroring any swaps made in the second half with corresponding swaps in the first half. After this, the k-th order statistic of the entire array must be either in the first k elements in the first half, or the first k elements in the second half.
My confusion comes from the case when the array length n is odd, and there is a lone element in the second half that has no partner. Since the recursion is performed on the second half, consisting of the last ceil(n/2) elements of the array, including the lone partnerless last element, and we are supposed to mirror all swaps made in second half with swaps made within the corresponding partners in the first half, it is unclear what to do when one of the swaps involves the final element, since it has no partner.
The textbook doesn't seem to take particular care on this issue, so I'm assuming that when a swap involves the final element, then just don't make any mirror moves of the partner in the first half at all. As a result, the final element simply "steals" the partner of whoever it got swapped with. However, in this case, is there an easy way to see if the algorithm is still correct? What if when the last element steals someone else's partner, the partner is actually the k-th order statistic, and gets swapped later on to an inaccessible location? The mechanics of the recursion and partitioning involving in order-statistic selection are sufficiently opaque to me such that I cannot confidently rule out that scenario.

I don't think your description of the algorithm is entirely accurate (but then the explanation you linked to is far from clear). As I understand it, the reason why the algorithm is correct for an odd-length array is as follows:
Let's first look at a few examples of even-length arrays, with n=10 and k=3 (i.e. we're looking for the third-smallest element, which is 2):
a. 5 2 7 6 1 9 3 8 4 0
b. 5 1 7 6 2 9 3 8 4 0
c. 5 0 7 6 2 9 3 8 4 1
d. 5 0 7 6 2 9 3 8 1 4
If we split the arrays into two parts, we get:
a. 5 2 7 6 1 9 3 8 4 0
b. 5 1 7 6 2 9 3 8 4 0
c. 5 0 7 6 2 9 3 8 4 1
d. 5 0 7 6 2 9 3 8 1 4
and these couples:
a. (5,9) (2,3) (7,8) (6,4) (1,0) <- 0 coupled with 1
b. (5,9) (1,3) (7,8) (6,4) (2,0) <- 0 coupled with 2
c. (5,9) (0,3) (7,8) (6,4) (2,1) <- 1 coupled with 2
d. (5,9) (0,3) (7,8) (6,1) (2,4) <- 0, 1 and 2 not coupled with each other
After comparing and swapping the couples so that their smallest element is in the first group, we get:
a. 5 2 7 4 0 9 3 8 6 1
b. 5 1 7 4 0 9 3 8 6 2
c. 5 0 7 4 1 9 3 8 6 2
d. 5 0 7 1 2 9 3 8 6 4
You'll see that the smallest element 0 will always be in the first group. The second-smallest element 1 will be either in the first group, or in the second group if it was coupled with the smallest element 0. The third-smallest element 2 will be either in the first group, or in the second group if it was coupled with either the smallest element 0 or the second-smallest element 1.
So the smallest element is in the first group, and the second- and third-smallest elements can be in either group. That means that the third-smallest element is either one of the 3 smallest elements in the first group, or one of the 2 (!) smallest elements in the second group.
a. 5 2 7 4 0 9 3 8 6 1 -> 0 2 4 + 1 3
b. 5 1 7 4 0 9 3 8 6 2 -> 0 1 4 + 2 3
c. 5 0 7 4 1 9 3 8 6 2 -> 0 1 4 + 2 3
d. 5 0 7 1 2 9 3 8 6 4 -> 0 1 2 + 3 4
So if we say that the k-th smallest element of the whole array is now one of the k-th smallest elements in either of the groups, there is an available spot in the the second group, and that's why, in an odd-length array, we'd add the uncoupled element to the second group. Whether or not the uncoupled element is the element we're looking for, it will certainly be one of the k-th smallest elements in either of the groups.
It is in fact more correct to say that the k-th smallest element is either one of the k smallest elements in the first group, or one of the k/2+1 smallest elements in the second group. I'm actually not sure that the algorithm is optimal, or even correct. There's a lot of repeated comparing and swapping going on, and the idea of keeping track of the couples and swapping elements in one group when their corresponding elements in the other group are swapped doesn't seem to make sense.

Related

Find the last smaller or equal number for every element in the array?

I have been given an array. I need to find the last (right-most) smaller or equal number for every element in the array.
For example:
2 5 1 6 4 7
2 has last smaller or equal number 1,
5 has last smaller or equal number 4 and not 1, etc.
Another example:
5 100 8 7 6 5 4 3 2 1
Here, every element has last smaller or equal number 1. I know the naive approach, i.e. O(n²), but need a better approach.
You could go from right to left and build a min array of the minimum number until now.
For your example 2 5 1 6 4 7, it would be:
Start at rightmost position:
7
4 7 (4 < 7)
4 4 7 (6 > 4)
...
So the min array for your example would be: 1 1 1 4 4 7
Now for each query, you start at the same position in the min array and go right until finding a number which is greater:
For 2:
2 5 1 6 4 7
1 1 1 4 4 7
^
------^
First element greater than 2 is 4, so return number just before = 1
For 5:
2 5 1 6 4 7
1 1 1 4 4 7
^
----------^
First element greater than 5 is 7, so return number just before = 4
To find efficiently the first element greater for each element in the input array you can use upper_bound algorith (example in C++ http://www.cplusplus.com/reference/algorithm/upper_bound/) to find the first element which is greater
Upper_bound takes log(N) time, so overall time to process every element in input array is O(NlogN)
Memory complexity is linear for the min array

HeIp understanding Fibonacci Search

On the internet I only find code for the algorithm but I need understand in form of text first because I have trouble understand things from code only. And other description of the algorithm are very complicated for me (on Wikipedia and other sites).
Here is what I understand for far:
Let say we want search in array the element 10:
Index i 0 1 2 3 4
2 3 4 10 40
Some fibonacci number here:
Index j 0 1 2 3 4 5 6 7 8 9
0 1 1 2 3 5 8 13 21 34
First thing we do is find fibonacci number that is greater-equal to array length. Array length is 4 so we need take fibonacci number 5 that is in index position j=5.
But where we divide the array now and how continue? I really don't understand it.. Please help understand for exam...
The algorithm goes in the following way:
The length of the array is 5, so the fibonacci number which is greater than or equal to 5 is 5. The two numbers which are preceding in the Fibonacci sequence are 2 [n-2] and 3 [n-1] - (2, 3, 5).
So, arr[n-2] i.e. arr[2] is compared with the number to be searched which is 10.
If the element is smaller than the number, then the sequence is shifted 1 time to the left. Also, the previous index is saved for next iteration to give an offset for the index. In this case, since 4 is smaller, n-2 becomes 1 (1, 2, 3). arr[1 + 2(prev)] = arr[3] = 10. So, the index of the number is 3.
If the element is larger, the sequence is shifted 2 times to the left.
Always the min(n-2+offset,n)th element is compared with number to get the matching result.

Algorithm to divide array of length n containing numbers from 1 to n (no repetition) into two equal sum

You are giving array of length N and numbers in the array contain 1 to N no repetition. You need to check if the array can be divided into to list of equal sum.
I know it can be solved using subset sum problem whose time complexity is.
Is there an algorithm so that I can reduce the time complexity?
As per your requirements, we conclude the array will always contain numbers 1 to N.
So if Array.Sum()==Even the answer is YES, otherwise NO.
Since the sum of elements from 1 to n equals n*(n+1)/2, you have to check if n*(n+1) is a multiple of 4, which is equivalent to checking if n is a multiple of 4, or if n+1 is a multiple of 4. The complexity of it is O(1).
If this condition is met, the two subsets are :
if n is a multiple of 4: sum up the odd numbers of first half with even numbers of second half on one hand, and even numbers of first half with odd of second half on the other.
For instance, 1 3 5 8 10 12 , and 2 4 6 7 9 11.
if n = 3 modulo 4 : almost the same thing, just split the first 3 between 1 and 2 on one hand, 3 on the other, you have a remaining serie which has a size multiple of 4.
For instance : 1 2 4 7 , and 3 5 6 ; or if you prefer, 3 4 7, and 1 2 5 6.

Permutation of number by desired order

I want to generate an algorithm for permutation of a list of distinct numbers in a specific order.
example :-
The numbers are
1 2 3 4
Order for permutation is
3 1 4 2
i.e. after permutation first number will go to third place, second to first place, third to fourth place and fourth to second place.
Now the sequence for the numbers will be
2 4 1 3
Now if the algorithm continues to do permutation by same order then after some iteration it will generate the same sequence of inputted numbers and it will stop. For this case total number of iteration is 4.
2 4 1 3
4 3 2 1
3 1 4 2
1 2 3 4
I am doing this by taking another array tmp[] with two other arrays named number[] and order[]. Now I am just copying the elements of number[] in tmp[] by maintaining the position order for particular element from order[] and checking for same number sequence before next iteration. If another iteration is needed then
number[]=tmp[] and the algorithm will repeat previous steps.
Now if the number of elements are large E.g. 10^7 or higher then this method will run slow.
Is there any better solution to find the number of iteration?
If you want to generate the permutation, your solution is already optimal because its complexity equals the size of the output.
However if you are just interested in the number of distinct permutations you can generate you can do much better:
decompose your "order" in cycles: for instance 3 1 4 2 is one cycle 1 -> 3 -> 4 -> 2 -> 1 but 2 1 4 3 is two cycles 1 -> 2 -> 1 and 3 -> 4 -> 3
The number of distinct permutations is lcm(n1, …, np) where n1, …, np are the length of the cycles and lcm is least common multiple.

Algorithm for Vertex connections From List of Directed Edges

The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that u→w is in E2 if and only if u ≠ w and there is a vertex v such that both u→v and v→w are in E2. The input file simply lists the edges in arbitrary order as ordered pairs of vertices, with each edge on a separate line. The vertices are numbered in order from 1 to the total number of vertices.
*self-loops and duplicate/parallel edges are not allowed
If we look at the an example of input data:
1 6
1 4
1 3
2 4
2 8
2 6
2 5
3 5
3 2
3 6
4 7
4 5
4 6
4 8
5 1
5 8
5 7
6 3
6 4
7 5
7 4
7 6
8 1
Then the output would be:
1: 3 4 7 8 5 2 6
2: 5 6 3 4 1 8 7
3: 1 7 8 6 5 4
4: 5 6 8 7 3 1
5: 3 1 4 6
6: 2 7 5 8
7: 1 5 6 8 3 4
8: 6 4 3
I'm writing the code in C.
My thoughts are to run through the file, see how many vertices they are and then allocate an array of pointers. Proceed to go through the list again searching for just where the line has a 1 in it, then look at where those corresponding numbers lead. If its not a duplicate or the same number(1) then I'll add it to a linked list, from the array of pointers. I will do this for every number vertex number in the file.
However, I feel this is terribly inefficient, and not the best way to go about doing this. If anyone has any other suggestions I would be extremely grateful.
if I get it right, you want to build a result set for each node where all nodes with a distance of one and two for each node are stated.
therefore, one can hold the edges in an adjacency matrix of bit arrays, where a bit is one when an edge exists and zero if not.
now one can multiply this matrix with itself. in this case multiply means you can make an AND on row and column.
A small example (sorry, don't know how to insert a matrix properly):
0 1 0 0 1 0 0 0 1
0 0 1 x 0 0 1 = 1 1 0
1 1 0 1 1 0 0 1 1
This matrix contains a one for all nodes reachable in two steps. simply it's the adjacency matrix for two instead of one steps. If you now OR this matrix with your initial matrix you have a matrix which holds all paths of length one and two.
this approach has multiple advantages. at first bit operations are very fast. the cpu parallyzes your calculations and you can stop for the result matrix cell if one pair is found where the results gives one.
furthermore it is well documented how to calculate matrix multiplication in parallel.
you can easily calculate all other length of pathes. for a length k one has to calculate:
A^k = A^(k-1) * A
hope that helped

Resources