approximate sorting Algorithm - arrays

Does anyone know an Algorithm that sorts k-approximately an array?
We were asked to find and Algorithm for k-approximate sorting, and it should run in O(n log(n/k)). but I can't seem to find any.
K-approx. sorting means that an array and any 1 <= i <= n-k such that sum a[j] <= sum a[j] i<=j<= i+k-1 i+1<=j<= i+k

I know I'm very late to the question ... But under the assumption that k is some approximation value between 0 and 1 (when 0 is completely unsorted and 1 is perfectly sorted) surely the answer to this is quicksort (or mergesort).
Consider the following array:
[4, 6, 9, 1, 10, 8, 2, 7, 5, 3]
Let's say this array is 'unsorted' - now apply one iteration of quicksort to this array with the (length[array]/2)th element as a pivot: length[array]/2 = 5. So the 5th element is our pivot (i.e. 8):
[4, 6, 2, 1, 3, 9, 7, 10, 8]
Now this is array is not sorted - but it is more sorted than one iteration ago, i.e. its approximately sorted but for a low approximation, i.e. a low value of k. Repeat this step again on the two halves of the array and it becomes more sorted. As k increases towards 1 - i.e. perfectly sorted - the complexity becomes O(N log(N/1)) = O(N log(N)).

Related

Number of subarrays whose mean rounds to zero

You have an array of integers. you have to find the number of subarrays which mean (sum of those elements divided by the count of those elements) rounds to zero.
I have solved this with O(n^2) time but it is not efficient enough. Is there a way to do it?
example:
[-1, 1, 5, 4]
subarrays which mean rounds to zero are:
[-1, 1] = 0 , [-1, 1, 5, -4] = 1/4 which rounds to zero
Denote new array composed of pairs (prefix sum, cnt) where first element is the prefix summation and second element is number of elements, for example,
int[] arr = [-1, 1, 5 ,4]:
int[] narr = [(0, 0), (-1, 1), (0, 2), (5, 3), (9, 4)]
the question is converted to count pair (i, j) in narr where i < j and Math.abs(narr[j][0] - narr[i][0]) < narr[j][1] - narr[i][1] = j - i which is further boiled down to:
narr[j][0] - j < narr[i][0] - i < narr[i][0] + i < narr[j][0] + j
so the question is further converted to the following question:
for some intervals: [[1, 2], [-1, 0], ...] (initially is empty), given an interval [x, y], count how many intervals are totally within the range of [x, y], then we add this interval, and repeat this procedure for total N times. (how to manage the data structure of intervals become the key problem)
If we just brute force iterate every intervals and do the validation, the query time complexity is O(N) and insertion time complexity is O(1), total O(N^2)
If we use square decomposition, the query time complexity is O(sqrt(N)) and insertion time complexity is O(1) , total O(Nsqrt(N))
If we use treap (using first or second as priority, use another as key), the average total time complexity we can achieve is O(NlgN)
If you don't know the technique of square decomposition or treap , I suggest you reading couple of articles first.
Update:
After carefully 30 mins thinking, I find treap cannot achieve O(NlgN) average time complexity.
Instead we can use 2d segment tree to achieve O(NlgNlgN):
Please read this article instead:
2d segment tree

Finding the square of a number in a sequential array of odds

I have a sequential odd array starting at 3. So x = {3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...}.
I am wondering if there is a quick way to find at what index the square of a number n is at. So if n was 5, I am looking for where 25 is in the array. Right now I have ((n) * (n - 1)) which I add to the current i index. Is there anything faster?
Your array is made of consecutive numbers and it's sorted, Because of this it forms a mathematical arithmetic progression with difference 1 and first element as 3, so at index i we have a[i]=i+3 and so i=a[i]-3.
So to find the index of the square of n let nsqr be n*n, nsqr index is simply nsqr-3, that's an O(1) algorithm.
To make it general whenever we have consecutive sorted numbers which start with a0 and differ by d, to find where is the square of n we do (nsqr-a0)/d.

array size n into k same size groups

I have an unsorted array of size n and I need to find k-1 divisors so every subset is of the same size (like after the array is sorted).
I have seen this question with k-1=3. I guess I need the median of medians and this is will take o(n). But I think we should do it k times so o(nk).
I would like to understand why it would take o(n logk).
For example: I have an unsorted array with integers and I want find the k'th divisors which is the k-1 integers that split the array into k (same sized) subarrays according to their values.
If I have [1, 13, 6, 7, 81, 9, 10, 11] the 3=k dividers is [7 ,11] spliting to [1 6, 9 10 13 81] where every subset is big as 2 and equal.
You can use a divide-and-conquer approach. First, find the (k-1)/2th divider using the median-of-medians algorithm. Next, use the selected element to partition the list into two sub-lists. Repeat the algorithm on each sub-list to find the remaining dividers.
The maximum recursion depth is O(log k) and the total cost across all sub-lists at each level is O(n), so this is an O(n log k) algorithm.

Find All Numbers in Array which Sum upto Zero

Given an array, the output array consecutive elements where total sum is 0.
Eg:
For input [2, 3, -3, 4, -4, 5, 6, -6, -5, 10],
Output is [3, -3, 4, -4, 5, 6, -6, -5]
I just can't find an optimal solution.
Clarification 1: For any element in the output subarray, there should a subset in the subarray which adds with the element to zero.
Eg: For -5, either one of subsets {[-2, -3], [-1, -4], [-5], ....} should be present in output subarray.
Clarification 2: Output subarray should be all consecutive elements.
Here is a python solution that runs in O(n³):
def conSumZero(input):
take = [False] * len(input)
for i in range(len(input)):
for j in range(i+1, len(input)):
if sum(input[i:j]) == 0:
for k in range(i, j):
take[k] = True;
return numpy.where(take, input)
EDIT: Now more efficient! (Not sure if it's quite O(n²); will update once I finish calculating the complexity.)
def conSumZero(input):
take = [False] * len(input)
cs = numpy.cumsum(input)
cs.insert(0,0)
for i in range(len(input)):
for j in range(i+1, len(input)):
if cs[j] - cs[i] == 0:
for k in range(i, j):
take[k] = True;
return numpy.where(take, input)
The difference here is that I precompute the partial sums of the sequence, and use them to calculate subsequence sums - since sum(a[i:j]) = sum(a[0:j]) - sum(a[0:i]) - rather than iterating each time.
Why not just hash the incremental sum totals and update their indexes as you traverse the array, the winner being the one with largest index range. O(n) time complexity (assuming average hash table complexity).
[2, 3, -3, 4, -4, 5, 6, -6, -5, 10]
sum 0 2 5 2 6 2 7 13 7 2 12
The winner is 2, indexed 1 to 8!
To also guarantee an exact counterpart contiguous-subarray for each number in the output array, I don't yet see a way around checking/hashing all the sum subsequences in the candidate subarrays, which would raise the time complexity to O(n^2).
Based on the example, I assumed that you wanted to find only the ones where 2 values together added up to 0, if you want to include ones that add up to 0 if you add more of them together (like 5 + -2 + -3), then you would need to clarify your parameters a bit more.
The implementation is different based on language, but here is a javascript example that shows the algorithm, which you can implement in any language:
var inputArray = [2, 3, -3, 4, -4, 5, 6, -6, -5, 10];
var ouputArray = [];
for (var i=0;i<inputArray.length;i++){
var num1 = inputArray[i];
for (var x=0;x<inputArray.length;x++){
var num2 = inputArray[x];
var sumVal = num1+num2;
if (sumVal == 0){
outputArray.push(num1);
outputArray.push(num2);
}
}
}
Is this the problem you are trying to solve?
Given a sequence , find maximizing such that
If so, here is the algorithm for solving it:
let $U$ be a set of contiguous integers
for each contiguous $S\in\Bbb Z^+_{\le n}$
for each $\T in \wp\left([i,j)\right)$
if $\sum_{n\in T}a_n = 0$
if $\left|S\right| < \left|U\left$
$S \to u$
return $U$
(Will update with full latex once I get the chance.)

Remove unsorted/outlier elements in nearly-sorted array

Given an array like [15, 14, 12, 3, 10, 4, 2, 1]. How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them.
Another example:
[13, 12, 4, 9, 8, 6, 7, 3, 2]
I want to be able to remove #4 and #7 so that I end up with:
[13, 12, 9, 8, 6, 3, 2]
There's also a problem that arises when you have this scenario:
[15, 13, 12, 7, 10, 5, 4, 3]
You could either remove 7 or 10 to make this array sorted.
In general, the problem I'm trying to solve, is that given a list of numerical readings (some could be off by quite a bit). I want the array to only include values that follow the general trendline and remove any outliers. I'm just wondering if there is a simple way to do this.
I would reduce your problem to the longest increasing (decreasing) subsequence problem.
https://en.wikipedia.org/wiki/Longest_increasing_subsequence
Since your sequence is nearly sorted, you are guaranteed to receive a satisfactory result (i.e. neatly following the trendline).
There exists a number of solutions to it; one of them is portrayed in the free book "Fundamentals of Computer Programming with C#" by Svetlin Nakov and Veselin Kolev; the problem is presented on page 257, exercise 6; solution is on page 260.
Taken from the book:
Write a program, which finds the maximal sequence of increasing elements in an array arr[n]. It is not necessary the elements to be consecutively placed. E.g.: {9, 6, 2, 7, 4, 7, 6, 5, 8, 4} -> {2, 4, 6, 8}.
Solution:
We can solve the problem with two nested loops and one more array len[0…n-1]. In the array len[i] we can keep the length of the longest consecutively increasing sequence, which starts somewhere in the array (it does not matter where exactly) and ends with the element arr[i]. Therefore len[0]=1, len[x] is the maximal sum max(1 + len[prev]), where prev < x and arr[prev] < arr[x]. Following the definition, we can calculate len[0…n-1] with two nested loops: the outer loop will iterate through the array from left to right with the loop variable x. The inner loop will iterate through the array from the start to position x-1 and searches for the element prev with maximal value of len[prev], where arr[prev] < arr[x]. After the search, we initialize len[x] with 1 + the biggest found value of len[prev] or with 1, if such a value is not found.
The described algorithm finds the lengths of all maximal ascending sequences, which end at each of the elements. The biggest one of these values is the length of the longest increasing sequence. If we need to find the elements themselves, which compose that longest sequence, we can start from the element, where the sequence ends (at index x), we can print it and we can search for a previous element (prev). By definition prev < x and len[x] = 1 + len[prev] so we can find prev with a for-loop from 1 to x-1. After that we can repeat the same for x=prev. By finding and printing the previous element (prev) many times until it exists, we can find the elements, which compose the longest sequence in reversed order (from the last to the first).
A simple algorithm which has been described by higuaro can help you to generate a correct sequence:
For each element at index i , if a[i] < a[i + 1], we can simply remove that element a[i].
for(int i = 0; i < size; i++)
while(a[i] < a[i + 1]){
remove a[i];
i--;
}
However, this approach cannot guarantee that the number of removed element is minimum. For example, for this sequence [10, 9, 8, 100, 1, 0], remove 100 will be optimal, instead of remove 8, then 9 then 10.
To find the minimum number of element to be removed, we notice that we need to find the longest decreasing sub sequence, which is similar to the classic longest increasing sub sequence whose solution has been described here

Resources