Remove unsorted/outlier elements in nearly-sorted array - arrays

Given an array like [15, 14, 12, 3, 10, 4, 2, 1]. How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them.
Another example:
[13, 12, 4, 9, 8, 6, 7, 3, 2]
I want to be able to remove #4 and #7 so that I end up with:
[13, 12, 9, 8, 6, 3, 2]
There's also a problem that arises when you have this scenario:
[15, 13, 12, 7, 10, 5, 4, 3]
You could either remove 7 or 10 to make this array sorted.
In general, the problem I'm trying to solve, is that given a list of numerical readings (some could be off by quite a bit). I want the array to only include values that follow the general trendline and remove any outliers. I'm just wondering if there is a simple way to do this.

I would reduce your problem to the longest increasing (decreasing) subsequence problem.
https://en.wikipedia.org/wiki/Longest_increasing_subsequence
Since your sequence is nearly sorted, you are guaranteed to receive a satisfactory result (i.e. neatly following the trendline).
There exists a number of solutions to it; one of them is portrayed in the free book "Fundamentals of Computer Programming with C#" by Svetlin Nakov and Veselin Kolev; the problem is presented on page 257, exercise 6; solution is on page 260.
Taken from the book:
Write a program, which finds the maximal sequence of increasing elements in an array arr[n]. It is not necessary the elements to be consecutively placed. E.g.: {9, 6, 2, 7, 4, 7, 6, 5, 8, 4} -> {2, 4, 6, 8}.
Solution:
We can solve the problem with two nested loops and one more array len[0…n-1]. In the array len[i] we can keep the length of the longest consecutively increasing sequence, which starts somewhere in the array (it does not matter where exactly) and ends with the element arr[i]. Therefore len[0]=1, len[x] is the maximal sum max(1 + len[prev]), where prev < x and arr[prev] < arr[x]. Following the definition, we can calculate len[0…n-1] with two nested loops: the outer loop will iterate through the array from left to right with the loop variable x. The inner loop will iterate through the array from the start to position x-1 and searches for the element prev with maximal value of len[prev], where arr[prev] < arr[x]. After the search, we initialize len[x] with 1 + the biggest found value of len[prev] or with 1, if such a value is not found.
The described algorithm finds the lengths of all maximal ascending sequences, which end at each of the elements. The biggest one of these values is the length of the longest increasing sequence. If we need to find the elements themselves, which compose that longest sequence, we can start from the element, where the sequence ends (at index x), we can print it and we can search for a previous element (prev). By definition prev < x and len[x] = 1 + len[prev] so we can find prev with a for-loop from 1 to x-1. After that we can repeat the same for x=prev. By finding and printing the previous element (prev) many times until it exists, we can find the elements, which compose the longest sequence in reversed order (from the last to the first).

A simple algorithm which has been described by higuaro can help you to generate a correct sequence:
For each element at index i , if a[i] < a[i + 1], we can simply remove that element a[i].
for(int i = 0; i < size; i++)
while(a[i] < a[i + 1]){
remove a[i];
i--;
}
However, this approach cannot guarantee that the number of removed element is minimum. For example, for this sequence [10, 9, 8, 100, 1, 0], remove 100 will be optimal, instead of remove 8, then 9 then 10.
To find the minimum number of element to be removed, we notice that we need to find the longest decreasing sub sequence, which is similar to the classic longest increasing sub sequence whose solution has been described here

Related

Return Array for Maximum Sum

I have trouble to find a solution for the following question.
"Given an integer array A, you partition the array in (contiguous) subarrays of length at most k. After partitioning, each subarray has their values changed to become the maximum value of the subarray.
These subarrays will be used to create a new array in the order when they are partitioned. The sum of the new array should have the maximum value.
Example:
Input: A = [1, 15, 7, 9, 2, 5, 10], k = 3
Output: newArray = [15, 15, 15, 9, 10, 10, 10]
One possible solution is to try all possible partitions and find the max sum. But I am looking for a better solution.
A posible implementation is to create a dictionary that stores the first value in the partition and if the next value is greater than the one stored, get rid of the one in the dictionary until the end of the partition. And repeat this for all partitions.

Finding the square of a number in a sequential array of odds

I have a sequential odd array starting at 3. So x = {3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...}.
I am wondering if there is a quick way to find at what index the square of a number n is at. So if n was 5, I am looking for where 25 is in the array. Right now I have ((n) * (n - 1)) which I add to the current i index. Is there anything faster?
Your array is made of consecutive numbers and it's sorted, Because of this it forms a mathematical arithmetic progression with difference 1 and first element as 3, so at index i we have a[i]=i+3 and so i=a[i]-3.
So to find the index of the square of n let nsqr be n*n, nsqr index is simply nsqr-3, that's an O(1) algorithm.
To make it general whenever we have consecutive sorted numbers which start with a0 and differ by d, to find where is the square of n we do (nsqr-a0)/d.

approximate sorting Algorithm

Does anyone know an Algorithm that sorts k-approximately an array?
We were asked to find and Algorithm for k-approximate sorting, and it should run in O(n log(n/k)). but I can't seem to find any.
K-approx. sorting means that an array and any 1 <= i <= n-k such that sum a[j] <= sum a[j] i<=j<= i+k-1 i+1<=j<= i+k
I know I'm very late to the question ... But under the assumption that k is some approximation value between 0 and 1 (when 0 is completely unsorted and 1 is perfectly sorted) surely the answer to this is quicksort (or mergesort).
Consider the following array:
[4, 6, 9, 1, 10, 8, 2, 7, 5, 3]
Let's say this array is 'unsorted' - now apply one iteration of quicksort to this array with the (length[array]/2)th element as a pivot: length[array]/2 = 5. So the 5th element is our pivot (i.e. 8):
[4, 6, 2, 1, 3, 9, 7, 10, 8]
Now this is array is not sorted - but it is more sorted than one iteration ago, i.e. its approximately sorted but for a low approximation, i.e. a low value of k. Repeat this step again on the two halves of the array and it becomes more sorted. As k increases towards 1 - i.e. perfectly sorted - the complexity becomes O(N log(N/1)) = O(N log(N)).

Calling Groups of Elements of Matlab Arrays

I'm dealing with long daily time series in Matlab, running over periods of 30-100+ years. I've been meaning to start looking at it by seasons, roughly approximating that by taking 91-day segments of each year over the time period (with some tbd method of correcting for odd number of days in the year)
Basically, what I want is an array indexing method that allows me to make a new array that takes 91 elements every 365 elements, starting at element 1. I've been looking for some normal array methods (some (:) or other), but I haven't been able to find one. I guess an alternative would be to kind of iterate over 365-day segments 91 times, but that seems needlessly complicated.
Is there a simpler way that I've missed?
Thanks in advance for the help!
So if I understand correctly, you want to extract elements 1-91, 366-457, 731-822, and so on? I'm not sure that there is a way to do this with basic matrix indexing, but you can do the following:
days = 1:365; %Create array ranging from 1 - 365
difference = length(data) - 365; %how much bigger is time series data?
padded = padarray(days, [0, difference], 'circular'); %extend to fit time series
extracted = data(padded <= 91); %get every element in the range 1-91
Basically what I am doing is creating an array that is the same size as your time series data that repeats 1-365 over and over. I then perform logical indexing on data, such that the padded array is less than or equal to 91.
As a more approachable example, consider:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
days = 1:5;
difference = length(x) - 5;
padded = padarray(days, [0, difference], 'circular');
extracted = x(padded <= 2);
padded then is equal to [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] and extracted is going to be [1, 2, 6, 7]

Total number of different permutations of an array in which relative order of elements in two disjoint sub-arrays remain constant

Give an array of integers.
For example a = {1,2,20,19}
Let two disjoint sub-arrays be {1,2} and {20,19}. There are 5 such permutations in which '1' always comes before '2' and '20' always comes before '19' and such are:
{1, 2, 20, 19}
{1, 20, 2, 19}
{1, 20, 19, 2}
{20, 1, 19, 2}
{20, 19, 1, 2}
My question is:
Given an array, a[1...n+m] of size=n+m. Find the number of permutations in which relative order of elements in two sub-arrays a[1..n] and a[n+1..n+m] remains the same.
In your example it will be 6 permutations. You've missed the {20,1,2,19}.
It is a pure math(combinatorics) question. The question is equivalent to -
find the number of all possible solutions of the equation :
x0+x1+....+xn = m
The answer is defined by the formula - (n+m)!/(m!n!)
For example in your case it will be n=m=2 and the answer is 4!/2!2! = 6
Suppose you have m+n empty positions. You have to fill these positions such that the relative ordering is correct.
Choose n positions. The number of ways to choose n positions is (m+n)!/n!m!.
Now there is only one way to fill the elements of the subarray1 to those n locations such that the relative ordering is preserved.
The rest m positions are empty. There is only one way to place the elements from subarray2 to the rest m positions such that relative ordering is preserved.
Hence the final answer becomes (m+n)!/m!n!.

Resources