How do I calculate the k nearest numbers to the median?

How do I calculate the k nearest numbers to the median? - arrays

I have an array of n pairwise different elements and a number k with 1<=k<=n.
Now I am looking for an algorithm calculating the k numbers with the minimal absolute difference to the median of the number array. I need linear complexity (O(n)).
My approach:
I find the median:
I sort the number
I get the middle element or if the number of elements id even then the average of the two elements in the middle and round.
After that:
I take every number and find the absolute distance from the median. These results I save in a different array
I sort the newly obtained array.
I take the first k elements of the result array and I'm done.
I don't know if my solution is in O(n), also whether I'm right with this idea. Can someone verify that? Can someone show me how to solve it in O(n)?

You can solve your problem like that:
You can find the median in O(n), w.g. using the O(n) nth_element algorithm.
You loop through all elements substutiting each with a pair: <the absolute difference to the median>, <element's value>. Once more you do nth_element with n = k. after applying this algorithm you are guaranteed to have the k smallest elements in absolute difference first in the new array. You take their indices and DONE!
Your algorithm, on the other hand uses sorting, and this makes it O(nlogn).
EDIT: The requested example:
Let the array be [14, 6, 7, 8, 10, 13, 21, 16, 23].
After the step for finding the median it will be reordered to, say: [8, 7, 9, 10, 13, 16, 23, 14, 21], notice that the array is not sorted, but still the median (13) is exactly in the middle.
Now let's do the pair substitution that got you confused: we create a new array of pairs: [<abs(14-13), 14>, <abs(6-13), 6>, <abs(7-13), 7>, <abs(8-13), 8>, <abs(10-13), 10>, <abs(13-13), 13>, <abs(21-13), 21>, <abs(16-13), 16>, <abs(23-13), 23>. Thus we obtain the array: [<1, 14>, <7, 6>, <6, 7>, <5, 8>, <3, 10>, <0, 13>, <8, 21>, <3, 16>, <10, 23>
If e.g. k is 4 we make once more nth_element(using the first element of each pair for comparisons) and obtain: [<1, 14>, <3, 16>, <0, 13>, <3, 10>, <8, 21>, <7, 6>, <10, 23>, <6, 7>, <5, 8>] so the numbers you search for are the second elements of the first 4 pairs: 14, 16, 13 and 10

Related

Minimize number of insertions when sorting an array

Assuming an array where every element is also a positive integer number, without duplicates and without any missing elements, like the following:
{15, 1, 2, 6, 7, 8, 3, 4, 9, 5, 10, 13, 11, 12, 14}
Considering the removal and insertion (and not swaping) of each element, how can I find the most efficient move (remove/insert) operations to sort the list with the minimum number of insertions.
I think identifying individual groups is helpful since I can more easily find what needs to be moved together:
{{15}, {1, 2}, {6, 7, 8}, {3, 4}, {9}, {5}, {10}, {13}, {11, 12}, {14}}
For that I can create another array with the difference between the value and the correct position to let me easily identify the groups and which ones are furthest from being correct.
{{14}, {-1, -1}, {2, 2, 2}, {-4, -4}, {0}, {-5}, {-1}, {1}, {-2, -2}, {-1}}
Then I choosed the group furthest from being in the correct position (largest difference) and with smaller number of elements. So based on that I can see that {15} is 14 away from being correct and should be the first to be moved. I THINK (I'm guessing here) that I need to move AT least the difference in value, because I can land in the middle of group. Repeating the procedure I move the {5} to before {6,7,8}, which is moved 6 spaces, more difference between value and correct position because there is a group in its correct spot. Then {3,4}, and finally {13} and the list is sorted.
I can already create a iterative method that does just that. But I think it would be highly inefficient, since I will be dealing with about 200k values and recalculating it after every set of insertions is a terrible idea.
PS: I NEED to follow this procedure (remove and insertion of elements, and thinking in groups) instead of other more time efficient methods, since this would be applied in the real world to sort items in shelves, and something with a smaller number of operations of less items is prefered rather than computational or memory requirements.

Minimizing the number of elements that are moved is the same as maximizing the number of elements that are not moved.
Since any elements you don't move will be left in their original order, those elements must be a subsequence of the desired sorted array. You can use the common algorithm to find the longest such subsequence:
https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
https://www.geeksforgeeks.org/longest-common-subsequence-dp-4/
Then remove all the elements that are not part of this subsequence, and reinsert them wherever they belong.
Note that there are optimizations you can use for this specific case of the longest monotonically increasing subsequence:
https://en.wikipedia.org/wiki/Longest_increasing_subsequence
https://www.geeksforgeeks.org/longest-increasing-subsequence-dp-3/

Create an integer array of size 16. Call it fred.
Init all of the values to 0
Iterate your unsorted array.
use each value as a subscript into fred, setting the value to 1.
Pretend your unsorted array is empty.
Iterate fred.
when you encounter a value of 1, that subscript needs to be inserted
back into your unsorted array.
Your unsorted array of size N is now sorted at a cost of N insertions

Finding the k'th element in unsorted array using external function

I need to design an algorithm that finds the k'th smallest element in unsorted array using function that called "MED3":
This function finds the n/3 (floor) and 2n/3 (ceil) elements of the array if it was sorted (very similar to median, but instead of n/2 it returns those values).
I thought about kind of partition around those 2 values, and than to continue like QuickSelect, but the problem is that "MED3" doesn't return indices of the 2 values, only the values.
for example, if the array is: 1, 2, 10, 1, 7, 6, 3, 4, 4 it returns 2 (n/3 value) and 4 (2n/3 value).
I also thought to run over the array and to take all the values between 2 and 4 (for example, in the given array above) to new array and then use "MED3" again, but can be duplicates (if the array is 2, 2, 2, 2, ..., 2 I would take all the elements each time).
Any ideas? I must use "MED3".
* MED3 is like a black box, it runs in linear time.
Thank you.

I think you're on the right track, but instead of taking 2 to 4, I'd suggest removing the first n/3 values that are <= MED3.floor() and the first n/3 values that are >= MED3.ceil(). That avoids issues with too many duplicates. If two passes/cycle aren't too expensive, you can remove all values < MED3.floor() + up to a total of n/3 values = MED3.floor() (do the same for ceil())
then repeat until you are at the k'th smallest target.

Elements in array O(nlogn) complexity method for finding pairs

Okay, I keep getting stuck with the complexity here. There is an array of elements, say A[n]. Need to find all pairs so that A[i]>A[j] and also i < j.
So if it is {10, 8, 6, 7, 11}, the pairs would be (10,8) (10, 6), (10,7) and so on...
I did a merge sort in nlogn time and then a binary search for the entire array again in nlogn to get the indices of the elements in the sorted array.
So sortedArray={6 7 8 10 11} and index={3 2 0 1 4}
Irrespective of what I try, I keep getting another n^2 time in the complexity when I begin loops to compare. I mean, if I start for the first element i.e. 10, it is at index[2] which means there are 2 elements less than it. So if index[2]<index[i] then they can be accepted but that increases the complexity. Any thoughts? I don't want the code, just a hint in the right direction would be helpful.
Thanks. Everything i have been doing in C and time complexity is important here c

You cannot do this in under O(N^2), because the number of pairs that the algorithm will produce when the original array sorted in descending order is N(N-1)/2. You simply cannot produce O(N^2) results in O(N*LogN) time.

The result consists of O(n^2) elements, so any attempt to iterate through all pairs will be O(n^2).

Extracting 2 numbers n times and placing back the addition in O(n) instead of O(n*log(n))

I'm presenting a problem my professor showed in class, with my O(n*log(n)) solution:
Given a list of n numbers we'd like to perform the following n-1 times:
Extract the two minimal elements x,y from the list and present them
Create a new number z , where z = x+y
Put z back into the list
Suggest a data structure and algorithm for O(n*log(n)) , and O(n)
Solution:
We'll use a minimal heap:
Creating the heap one time only would take O(n). After that, extracting the two minimal elements would take O(log(n)). Placing z into the heap would take O(log(n)).
Performing the above n-1 times would take O(n*log(n)), since:
O(n)+O(n∙(logn+logn ))=O(n)+O(n∙logn )=O(n∙logn )
But how can I do it in O(n)?
EDIT:
By saying: "extract the two minimal elements x,y from the list and present them ", I mean printf("%d,%d" , x,y), where x and y are the smallest elements in the current list.

This is not a full answer. But if the list was sorted, then your problem is easiy doable in O(n). To do it, arrange all of the numbers in a linked list. Maintain a pointer to a head, and somewhere in the middle. At each step, take the top two elements off of the head, print them, advance the middle pointer until it is where the sum should go, and insert the sum.
The starting pointer will move close to 2n times and the middle pointer will move about n times, with n inserts. All of those operations are O(1) so the sum total is O(n).
In general you cannot sort in time O(n), but there are a number of special cases in which you can. So in some cases it is doable.
The general case is, of course, not solvable in time O(n). Why not? Because given your output, in time O(n) you can run through the output of the program, build up the list of pairwise sums in order as you go, and filter them out of the output. What is left is the elements of the original list in sorted order. This would give a O(n) general sorting algorithm.
Update: I was asked to show how could you go from the output (10, 11), (12, 13), (14, 15), (21, 25), (29, 46) to the input list? The trick is that you always keep everything in order then you know how to look. With positive integers, the next upcoming sum to use will always be at the start of that list.
Step 0: Start
input_list: (empty)
upcoming sums: (empty)
Step 1: Grab output (10, 11)
input_list: 10, 11
upcoming_sums: 21
Step 2: Grab output (12, 13)
input_list: 10, 11, 12, 13
upcoming_sums: 21, 25
Step 3: Grab output (14, 15)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sums: 21, 25, 29
Step 4: Grab output (21, 25)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 29, 46
Step 5: Grab output (29, 46)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 75

This isn't possible in the general case.
Your problem statement reads that you must reduce your array to a single element, performing a total of n-1 reduction operations. Therefore, the number of reduction operations performed is on the order of O(n). To achieve a overall running time of O(n), each reduction operation must run in O(1).
You have clearly defined your reduction operation:
remove the 2 minimal elements in the array and print them, then
insert the sum of those elements into the array.
If your data structure were a sorted list, it is trivial to remove two minimal elements in O(1) time (pop them off the end of the list). However, reinserting an element in O(1) is not possible (in the general case). As SteveJessop pointed out, if you could insert into a sorted list in O(1) time, the resultant operations would constitute an O(n) sorting algorithm. But there is no such known algorithm.
There are some exceptions here. If your numbers are integers, you may be able to use "radix insert" to achieve O(1) inserts. If your array of numbers are sufficiently sparse in the number line, you may be able to deduce insert points in O(1). There are numerous other exceptions, but they are all exceptions.
This answer doesn't answer your question, per se, but I believe it's relevant enough to warrant an answer.

If the range of values is less than n, then this can be solved in O(n).
1> Create an array mk of size equal to range of values and initialize it to all zero
2> traverse through the array and increment value of mk at the position of the array element.
i.e if the array element is a[i] then increment mk[a[i]]
3) For presenting the answers after each of the n-1 operations follow the following steps:
There are two cases:
Case 1 : all of a[i] are positive
traverse through mk array from 0 to its size
cnt = 0
do this till cnt doesn't equal 2
grab a nonzero element decrease its value by 1 and increment cnt by 1
you can get two minimum values in this way
present them
now do mk[sum of two minimum]++
Case 2 : some of a[i] is negative
<still to update>

O(nlogn) is easy - just use a heap, treap or skiplist.
O(n) sounds tough.
https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/Treap
https://en.wikipedia.org/wiki/Skip_list

efficient sorted Cartesian product of 2 sorted array of integers

Need Hints to design an efficient algorithm that takes the following input and spits out the following output.
Input: two sorted arrays of integers A and B, each of length n
Output: One sorted array that consists of Cartesian product of arrays A and B.
For Example:
Input:
A is 1, 3, 5
B is 4, 8, 10
here n is 3.
Output:
4, 8, 10, 12, 20, 24, 30, 40, 50
Here are my attempts at solving this problem.
1) Given that output is n^2, Efficient algorithm can't do any better than O(n^2) time complexity.
2) First I tried a simple but inefficient approach. Generate Cartesian product of A and B. It can be done in O(n^2) time complexity. we need to store, so we can do sorting on it. Therefore O(n^2) space complexity too. Now we sort n^2 elements which can't be done better than O(n^2logn) without making any assumptions on the input.
Finally I have O(n^2logn) time and O(n^2) space complexity algorithm.
There must be a better algorithm because I've not made use of sorted nature of input arrays.

If there's a solution that's better than O(n² log n) it needs to do more than just exploit the fact that A and B are already sorted. See my answer to this question.
Srikanth wonders how this can be done in O(n) space (not counting the space for the output). This can be done by generating the lists lazily.
Suppose we have A = 6,7,8 and B = 3,4,5. First, multiply every element in A by the first element in B, and store these in a list:
6×3 = 18, 7×3 = 21, 8×3 = 24
Find the smallest element of this list (6×3), output it, replace it with that element in A times the next element in B:
7×3 = 21, 6×4 = 24, 8×3 = 24
Find the new smallest element of this list (7×3), output it, and replace:
6×4 = 24, 8×3 = 24, 7×4 = 28
And so on. We only need O(n) space for this intermediate list, and finding the smallest element at each stage takes O(log n) time if we keep the list in a heap.

If you multiply a value of A with all values of B, the result list is still sorted. In your example:
A is 1, 3, 5
B is 4, 8, 10
1*(4,8,10) = 4,8,10
3*(4,8,10) = 12,24,30
Now you can merge the two lists (exactly like in merge sort). You just look at both list heads and put the smaller one in the result list. so here you would select 4, then 8 then 10 etc.
result = 4,8,10,12,24,30
Now you do the same for result list and the next remaining list merging 4,8,10,12,24,30 with 5*(4,8,10) = 20,40,50.
As merging is most efficient if both lists have the same length, you can modify that schema by dividing A in two parts, do the merging recursively for both parts, and merge both results.
Note that you can save some time using a merge approach as is isn't required that A is sorted, just B needs to be sorted.