An array with O(n) inversions - arrays

Trying to figure out what type of an array consists of at most n inversions with n being the array size. I was thinking an array that is nearly sorted would fall under this case and also an array that is almost completely sorted with the max element and min element switched, for instance..
9 2 3 4 5 6 7 8 1
So my thinking is that when an array has at most n inversions, is it safe to say that the array is nearly sorted? Or are there other cases where the array would have at most n inversions and not be nearly sorted.

The 'least' sorted array (i.e. reverse sorted) has 1 + 2 + 3 + ... + n-1 = n(n-1)/2 inversions.
The less inversions an array has, the 'more' sorted it is.
And, since n is quite a bit smaller than n(n-1)/2, one can probably call an array with n inversions 'nearly sorted'.
This array has n-1 inversions:
9 1 2 3 4 5 6 7 8
In response to your comment, insertion sort's complexity is O(n + d), where d is the number of inversions, thus it will run in O(n) for O(n) inversions.

Related

For each element of an unordered array output the number of greater elements

I guess question is quite straight forward, so let me explain with an example
Input Array = {3 1 8 2 5 3 6 7};
Output Required = {4,7,0,6,3,4,2,1};
Greater than 3 are 4 elements in array (5,6,7,8)
Greater than 1 are 7 elements in array (2,3,3,5,6,7,8)
Greater than 8 are 0 elements in array ()
Greater than 2 are 6 elements in array (3,3,5,6,7,8)
Greater than 5 are 3 elements in array (6,7,8)
Greater than 3 are 4 elements in array (5,6,7,8)
Greater than 6 are 2 elements in array (7,8)
Greater than 7 are 1 elements in array (8)
So one approach will be just to run two nested for loops and be done with it,
time complexity O(N^2), space complexity O(1)
How this can be further optimized?
If you create a copy of the list and sort it, then (assuming unique elements), the 'greater element count' for a value is just
(total number of elements - 1 - position of value in sorted_list),
where we subtract 1 since indices start at 0 and we only want strictly greater elements.
When elements can be repeated, we should now find the unique elements of the original list and sort them, but also keep track of how many times each element appeared. Then, we need the 'weighted position' of each value in the sorted list, which is the sum of counts of all values at or before that index.
After creating a mapping from each unique value to the count of strictly greater elements, iterate over the original list, replacing each element with the count it's been mapped to.
Since we can convert between 'greater element counts' and the full sorted list in linear time, this method is asymptotically optimal, as it finds greater element counts in O(n log n) time.
Here's a short Python implementation of that idea.
def greater_element_counts(arr: List[int]) -> List[int]:
"""Return a list with the number of strictly larger elements
in arr for each position in arr."""
element_to_counts = collections.Counter(arr)
unique_sorted_elements = sorted(element_to_counts.keys())
greater_element_count = len(arr)
answer_by_element = {}
for unique_element in unique_sorted_elements:
greater_element_count -= element_to_counts[unique_element]
answer_by_element[unique_element] = greater_element_count
return [answer_by_element[element] for element in arr]

2D array minimum sum of Y elements and just two rows that we can chose to get minimum

With given 2d array[X][Y], i have to find the smallest possible sum of Y elements but:
the sum must be created by using just 2 rows,
each value must be from different index
Example:
for array
7 3 7 9
2 20 10 6
8 8 8 8
Result should be 18, as we get 3 + 7 from 1st row and 2 + 6 from 2nd.
I've been thinking about few hours but i can't figure out how to deal with it.
Try this one here.
Method 1 (Naive Approach): Check every possible submatrix in given 2D
array. This solution requires 4 nested loops and time complexity of
this solution would be O(n^4).
Method 2 (Efficient Approach): Kadane’s algorithm for 1D array can be
used to reduce the time complexity to O(n^3).

2sum with duplicate values

The classic 2sum question is simple and well-known:
You have an unsorted array, and you are given a value S. Find all pairs of elements in the array that add up to value S.
And it's always been said that this can be solved with the use of HashTable in O(N) time & space complexity or O(NlogN) time and O(1) space complexity by first sorting it and then moving from left and right,
well these two solution are obviously correct BUT I guess not for the following array :
{1,1,1,1,1,1,1,1}
Is it possible to print ALL pairs which add up to 2 in this array in O(N) or O(NlogN) time complexity ?
No, printing out all pairs (including duplicates) takes O(N2). The reason is because the output size is O(N2), thus the running time cannot be less than that (since it takes some constant amount of time to print each element in the output, thus to simply print the output would take CN2 = O(N2) time).
If all the elements are the same, e.g. {1,1,1,1,1}, every possible pair would be in the output:
1. 1 1
2. 1 1
3. 1 1
4. 1 1
5. 1 1
6. 1 1
7. 1 1
8. 1 1
9. 1 1
10. 1 1
This is N-1 + N-2 + ... + 2 + 1 (by taking each element with all elements to the right), which is
N(N-1)/2 = O(N2), which is more than O(N) or O(N log N).
However, you should be able to simply count the pairs in expected O(N) by:
Creating a hash-map map mapping each element to the count of how often it appears.
Looping through the hash-map and summing, for each element x up to S/2 (if we go up to S we'll include the pair x and S-x twice, let map[x] == 0 if x doesn't exist in the map):
map[x]*map[S-x] if x != S-x (which is the number of ways to pick x and S-x)
map[x]*(map[x]-1)/2 if x == S-x (from N(N-1)/2 above).
Of course you can also print the distinct pairs in O(N) by creating a hash-map similar to the above and looping through it, and only outputting x and S-x the value if map[S-x] exists.
Displaying or storing the results is O(N2) only.The worst case as highlighted by you clearly has N2 pairs and to write them onto the screen or storing them into a result array would clearly require at least that much time.In short, you are right!
No
You can pre-compute them in O(nlogn) using sorting but to print them you may need more than O(nlogn).In worst case It can be O(N^2).
Let's modify the algorithm to find all duplicate pairs.
As an example:
a[ ]={ 2 , 4 , 3 , 2 , 9 , 3 , 3 } and sum =6
After sorting:
a[ ] = { 2 , 2 , 3 , 3 , 3 , 4 , 9 }
Suppose you found pair {2,4}, now you have to find count of 2 and 4 and multiply them to get no of duplicate pairs.Here 2 occurs 2 times and 1 occurs 1 times.Hence {2,1} will appear 2*1 = 2 times in output.Now consider special case when both numbers are same then count no of occurrence and sq them .Here { 3,3 } sum to 6. occurrence of 3 in array is 3.Hence { 3,3 } will appear 9 times in output.
In your array {1,1,1,1,1} only pair {1,1} will sum to 2 and count of 1 is 5.hence there are going to 5^2=25 pairs of {1,1} in output.

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Is there a more elegant way of doing this?

Given an array of positive integers a I want to output array of integers b so that b[i] is the closest number to a[i] that is smaller then a[i], and is in {a[0], ... a[i-1]}. If such number doesn't exist, then b[i] = -1.
Example:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
b[0] = -1 since there is no number that is smaller than 2
b[1] = -1 since there is no number that is smaller than 1 from {2}
b[2] = 2, closest number to 7 that is smaller than 7 from {2,1} is 2
b[3] = 2, closest number to 5 that is smaller than 5 from {2,1,7} is 2
b[4] = 5, closest number to 7 that is smaller than 7 from {2,1,7,5} is 5
I was thinking about implementing balanced binary tree, however it will require a lot of work. Is there an easier way of doing this?
Here is one approach:
for i ← 1 to i ← (length(A)-1) {
// A[i] is added in the sorted sequence A[0, .. i-1] save A[i] to make a hole at index j
item = A[i]
j = i
// keep moving the hole to next smaller index until A[j - 1] is <= item
while j > 0 and A[j - 1] > item {
A[j] = A[j - 1] // move hole to next smaller index
j = j - 1
}
A[j] = item // put item in the hole
// if there are elements to the left of A[j] in sorted sequence A[0, .. i-1], then store it in b
// TODO : run loop so that duplicate entries wont hamper results
if j > 1
b[i] = A[j-1]
else
b[1] = -1;
}
Dry run:
a = 2 1 7 5 7 9
a[1] = 2
its straight forward, set b[1] to -1
a[2] = 1
insert into subarray : [1 ,2]
any elements before 1 in sorted array ? no.
So set b[2] to -1 . b: [-1, -1]
a[3] = 7
insert into subarray : [1 ,2, 7]
any elements before 7 in sorted array ? yes. its 2
So set b[3] to 2. b: [-1, -1, 2]
a[4] = 5
insert into subarray : [1 ,2, 5, 7]
any elements before 5 in sorted array ? yes. its 2
So set b[4] to 2. b: [-1, -1, 2, 2]
and so on..
Here's a sketch of a (nearly) O(n log n) algorithm that's somewhere in between the difficulty of implementing an insertion sort and balanced binary tree: Do the problem backwards, use merge/quick sort, and use binary search.
Pseudocode:
let c be a copy of a
let b be an array sized the same as a
sort c using an O(n log n) algorithm
for i from a.length-1 to 1
binary search over c for key a[i] // O(log n) time
remove the item found // Could take O(n) time
if there exists an item to the left of that position, b[i] = that item
otherwise, b[i] = -1
b[0] = -1
return b
There's a few implementation details that can make this have poor runtime.
For instance, since you have to remove items, doing this on a regular array and shifting things around will make this algorithm still take O(n^2) time. So, you could store key-value pairs instead. One would be the key, and the other would be the number of those keys (kind of like a multiset implemented on an array). "Removing" one would just be subtracting the second item from the pair and so on.
Eventually you will be left with a bunch of 0-value keys. This would eventually make the if there exists an item to the left take roughly O(n) time, and therefore, the entire algorithm would degrade to a O(n^2) for that reason. So another optimization might be to batch remove all of them periodically. For instance, when 1/2 of them are 0-values, perform a pruning.
The ideal option might be to implement another data structure that has a much more favorable remove time. Something along the lines of a modified unrolled linked list with indices could work, but it would certainly increase the implementation complexity of this approach.
I've actually implemented this. I used the first two optimizations above (storing key-value pairs for compression, and pruning when 1/2 of them are 0s). Here's some benchmarks to compare using an insertion sort derivative to this one:
a.length This method Insert sort Method
100 0.0262ms 0.0204ms
1000 0.2300ms 0.8793ms
10000 2.7303ms 75.7155ms
100000 32.6601ms 7740.36 ms
300000 98.9956ms 69523.6 ms
1000000 333.501 ms ????? Not patient enough
So, as you can see, this algorithm grows much, much slower than the insertion sort method I posted before. However, it took 73 lines of code vs 26 lines of code for the insertion sort method. So in terms of simplicity, the insertion sort method might still be the way to go if you don't have time requirements/the input is small.
You could treat it like an insertion sort.
Pseudocode:
let arr be one array with enough space for every item in a
let b be another array with, again, enough space for all elements in a
For each item in a:
perform insertion sort on item into arr
After performing the insertion, if there exists a number to the left, append that to b.
Otherwise, append -1 to b
return b
The main thing you have to worry about is making sure that you don't make the mistake of reallocating arrays (because it would reallocate n times, which would be extremely costly). This will be an implementation detail of whatever language you use (std::vector's reserve for C++ ... arr.reserve(n) for D ... ArrayList's ensureCapacity in Java...)
A potential downfall with this approach compared to using a binary tree is that it's O(n^2) time. However, the constant factors using this method vs binary tree would make this faster for smaller sizes. If your n is smaller than 1000, this would be an appropriate solution. However, O(n log n) grows much slower than O(n^2), so if you expect a's size to be significantly higher and if there's a time limit that you are likely to breach, you might consider a more complicated O(n log n) algorithm.
There are ways to slightly improve the performance (such as using a binary insertion sort: using binary search to find the position to insert into), but generally they won't improve performance enough to matter in most cases since it's still O(n^2) time to shift elements to fit.
Consider this:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
c 0 1 2 3 4 5 6 7 8 9
0 - - - - - - - - - -
Where the index of C is value of a[i] such that 0,3,4,6,8 would have null values.
and the 1st dimension of C contains the highest to date closest value to a[i]
So in step by a[3] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 2 - -
and by step a[5] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 5 - 7
This way when we get to the 2nd 7 at a[4] we know that 2 is the largest value to date and all we need to do is loop back through a[i-1] until we encounter a 7 again comparing the a[i] value to that in c[7] if bigger, replace c[7]. Once a[i-1] = the 7 we put c[7] into b[i] and move on to next a[i].
The main downfalls to this approach that I can see are:
footprint size depending on how big the c[] needs to be dimensioned..
the fact that you have to revisit elements of a[] that you've already touched. If the distribution of data is such that there are significant spaces between the two 7's then keeping track of the highest value as you go would presumably be faster. Alternatively it might be better to gather statistics on the a[i] up front to know what distributions exist and then use a hybrid method maintaining the max until such time that no more instances of that number are in the statistics.

Resources