Holding a sorted array - reverse sorted inputs case - arrays

I get integers from the user (one by one) and insert into a sorted vec to its right place by running binary search and finding the insertion index.
The problem is when user decides to provide a reversed sorted input (one by one) then insertion will be expensive, O(n^2), since on each insertion, all of the current elements in the vec has to be shifted to the right. Is there an algorithm that can handle this with less time?
Example:
[] <- 10
[10] <- 9 // Shift x1
[9, 10] <- 8 // Shift x2
[8, 9, 10] <- 7 // Shift x3
[7, 8, 9, 10] <- 6 // Shift x4
.
.
.

The problem is when user decides to provide a reversed sorted input (one by one) then insertion will be expensive, O(n^2), since on each insertion, all of the current elements in the vec has to be shifted to the right.
The Vec implementation will shift all the contents at once (using a memcpy) so shifting 20 items and shifting 1 doesn't really make any difference. If the collection is huge memory traffic will start being a concern but at low arities you can treat it as a constant.
Is there an algorithm that can handle this with less time?
An intrinsically sorted tree-based data structure. But the Rust standard library is somewhat limited on that front, and a BTreeSet will only work if you're deduplicating anyway. Not sure it will beat a regular Vec though, as it'll have a higher number of allocations.
And while a LinkedList theoretically provides O(1) insertion, Rust doesn't provide an insertion API because there's no Cursor, so you'd be paying O(n-i) to look for the insertion index, following which insert() would be paying that again to traverse to the index in question and insert the new item.

Related

Array Moves Chunks Of Neighboring Elements To A New Index

The Question
The question is simple, Imagine I have a list or an array (not linkedList)
list = [1, 2, ... 999999]
Now I wanna move elements from index 3000 to 600000 to index 100. The result should be easy to imagine
[1, 2,... 99, 100, 3000, 3001, ... 6000000, 101, 102, ...2999, 600001, 600002, ... 999999]
How to do those operations efficiently?
Judge My Thinking
Disclaimer: You can say this operation is exactly same as moving 101 to 2999 to index 600000. Which seems a more efficient operation. But that's not the algorithm my question is about, my question is about how to move more efficiently, so let's do the original question.
I can think of several ways:
Just do delete and insert for element 3000 to 600000. (What's the time complexity?)
Save elements from [3000, 600000] to a temporary space, then use delete insert to move everything from [101 to 2999] down to [597192, 600000] to make space to transfer [3000, 600000] back into index 100. Temporary holding data from [3000, 600000] will cost some memory, but does the copying make the whole operation slower? (Time complexity?)
is an attempt to improve 2. Same idea, but the move operation is not done by delete insert, but by manually copy [101,2999] to [597192, 600000]. (Time complexity? does it improve speed compared to delete insert)?
is an attempt to improve 2 or 3. Same idea, but no delete insert, but using many copying. But not copying everything from [3000, 600000], but only hold 1 element at a time in temporary memory, and move / copy everything in a complicated way. (Is this faster than others? Is it possible to implement? Can you show me the code / pseudo-code)
Is there better ideas?
Thank you for reading and thinking.
The algorithm you are after is known as rotate. There are two common ways to implement it. Both are running in O(length) time and O(1) space.
One, which is attributed to Dijkstra, is ultimately efficient in a sense that every element is moved just once. It is kind of tricky to implement, and requires a non-obvious setup. Besides, it may behave in a very cache-unfriendly manner. For details, see METHOD 3 (A Juggling Algorithm).
Another is very simple, cache-friendly, but moves each element twice. To rotate a range [l, r) around a midpoint m do
reverse(l, m);
reverse(m, r);
reverse(l, r);
I split the list at the "breakpoints", and reassembled the pieces. Creating a slice should be O(n), with n being the length of the slice. The longest slice is up to len(a) elements, so the storing of the list pieces should be O(n), with n being len(a). Reassembling the list pieces is O(n) as well, so this should be O(n) in total. The memory requirement is 2*len(a), since we store the slices as well, which sum up to the same length as a.
def truck(a, start, end, to):
a = list(a)
diff = end - start
to_left = to < start
split_save = a[to:to+diff]
split_take = a[start:end]
if to_left:
split_first = a[:to]
split_second = a[to+diff:start]
split_third = a[end:]
res = split_first + split_take + split_second + split_save + split_third
else:
split_first = a[:start]
split_second = a[end:to]
split_third = a[to+diff:]
res = split_first + split_save + split_second + split_take + split_third
return res
print(truck(range(10), 5, 8, 2))
# [0, 1, 5, 6, 7, 2, 3, 4, 8, 9]
print(truck(range(10), 2, 5, 8))
# [0, 1, 8, 9, 5, 6, 7, 2, 3, 4]
Let [l, r] be the segment you want to move, and L = r-l+1, the length of segment. N is total count of elements.
Delete and Insert at arbitrary position in array takes O(N), and delete and insert occurs O(L). So total time complexity is O(NL).
Same as #1. It takes O(NL) because of delete and insert.
3, 4. Copy and move takes O(L). Simply we can say it takes O(N)
Now, some fancy data structure for better complexity. You can use tree data structure to store linear array.
Especially, self-balancing binary search tree(BBST). It takes O(logN) to insert and delete one element.
Time complexity of moving segment to arbitrary position could be O(L logN), delete and insert each element.
You can find this data strcuture std::map in C++.
O(L logN) does not seem to be better. But, it can be better with BBST, to amortized O(log N)
You can gather elements in [l, r] to one subtree. Then cut this subtree from BBST. It is delete.
Insert this subtree at position you want.
For example, gather [3000, 600000] to one subtree and cut it from its parent. (Delete segment at once)
Make this subtree right child of 100th element in inorder, or left child of 101th element in inorder. (Insert segment at once)
Then, tree contains elements in order what you want.
Splay Tree would be good choice.

Sort an array so the difference of elements a[i]-a[i+1]<=a[i+1]-a[i+2]

My mind is blown since I began, last week, trying to sort an array of N elements by condition: the difference between 2 elements being always less or equal to the next 2 elements. For example:
Α[4] = { 10, 2, 7, 4}
It is possible to rearrange that array this way:
{2, 7, 10, 4} because (2 - ­7 = ­-5) < (7 - ­10 = -­3) < (10 - ­4 = 6)
{4, 10, 7, 2} because (4 - ­10 = -­6) < (10 - ­7 = ­3) < (7 - ­2 = 5)
One solution I considered was just shuffling the array and checking each time if it agreed with the conditions, an efficient method for a small number of elements, but time consuming or even impossible for a larger number of elements.
Another was trying to move elements around the array with loops, hoping again to meet the requirements, but again this method is very time consuming and also sometimes not possible.
Trying to find an algorithm doesn't seem to have any result but there must be something.
Thank you very much in advance.
I normally don't just provide code, but this question intrigued me, so here's a brute-force solution, that might get you started.
The concept will always be slow because the individual elements in the list to be sorted are not independent of each other, so they cannot be sorted using traditional O(N log N) algorithms. However, the differences can be sorted that way, which simplifies checking for a solution, and permutations could be checked in parallel to speed up the processing.
import os,sys
import itertools
def is_diff_sorted(qa):
diffs = [qa[i] - qa[i+1] for i in range(len(qa)-1)]
for i in range(len(diffs)-1):
if diffs[i] > diffs[i+1]:
return False
return True
a = [2,4,7,10]
#a = [1,4,6,7,20]
a.sort()
for perm in itertools.permutations(a):
if is_diff_sorted(perm):
print "Solution:",str(a)
break
This condition is related to differentiation. The (negative) difference between neighbouring elements has to be steady or increasing with increasing index. Multiply the condition by -1 and you get
a[i+1]-a[i] => a[i+2]-a[i+1]
or
0 => (a[i+2]-a[i+1])- (a[i+1]-a[i])
So the 2nd derivative has to be 0 or negative, which is the same as having the first derivative stay the same or changing downwards, like e.g. portions of the upper half of a circle. That does not means that the first derivative itself has to start out positive or negative, just that it never change upward.
The problem algorithmically is that it can't be a simple sort, since you never compare just 2 elements of the list, you'll have to compare three at a time (i,i+1,i+2).
So the only thing you know apart from random permutations is given in Klas` answer (values first rising if at all, then falling if at all), but his is not a sufficient condition since you can have a positive 2nd derivative in his two sets (rising/falling).
So is there a solution much faster than the random shuffle? I can only think of the following argument (similar to Klas' answer). For a given vector the solution is more likely if you separate the data into a 1st segment that is rising or steady (not falling) and a 2nd that is falling or steady (not rising) and neither is empty. Likely an argument could be made that the two segments should have approximately equal size. The rising segment should have the data that are closer together and the falling segment should contain data that are further apart. So one could start with the mean, and look for data that are close to it, move them to the first set,then look for more widely spaced data and move them to the 2nd set. So a histogram might help.
[4 7 10 2] --> diff [ 3 3 -8] --> 2diff [ 0 -11]
Here is a solution based on backtracking algorithm.
Sort input array in non-increasing order.
Start dividing the array's values into two subsets: put the largest element to both subsets (this would be the "middle" element), then place second largest one into arbitrary subset.
Sequentially put the remaining elements to either subset. If this cannot be done without violating the "difference" condition, use other subset. If both subsets are not acceptable, rollback and change preceding decisions.
Reverse one of the arrays produced on step 3 and concatenate it with other array.
Below is Python implementation (it is not perfect, the worst defect is recursive implementation: while recursion is quite common for backtracking algorithms, this particular algorithm seems to work in linear time, and recursion is not good for very large input arrays).
def is_concave_end(a, x):
return a[-2] - a[-1] <= a[-1] - x
def append_element(sa, halves, labels, which, x):
labels.append(which)
halves[which].append(x)
if len(labels) == len(sa) or split_to_halves(sa, halves, labels):
return True
if which == 1 or not is_concave_end(halves[1], halves[0][-1]):
halves[which].pop()
labels.pop()
return False
labels[-1] = 1
halves[1].append(halves[0][-1])
halves[0].pop()
if split_to_halves(sa, halves, labels):
return True
halves[1].pop()
labels.pop()
def split_to_halves(sa, halves, labels):
x = sa[len(labels)]
if len(halves[0]) < 2 or is_concave_end(halves[0], x):
return append_element(sa, halves, labels, 0, x)
if is_concave_end(halves[1], x):
return append_element(sa, halves, labels, 1, x)
def make_concave(a):
sa = sorted(a, reverse = True)
halves = [[sa[0]], [sa[0], sa[1]]]
labels = [0, 1]
if split_to_halves(sa, halves, labels):
return list(reversed(halves[1][1:])) + halves[0]
print make_concave([10, 2, 7, 4])
It is not easy to produce a good data set to test this algorithm: plain set of random numbers either is too simple for this algorithm or does not have any solutions. Here I tried to generate a set that is "difficult enough" by mixing together two sorted lists, each satisfying the "difference" condition. Still this data set is processed in linear time. And I have no idea how to prepare any data set that would demonstrate more-than-linear time complexity of this algorithm...
Not that since the diffence should be ever-rising, any solution will have element first in rising order and then in falling order. The length of either of the two "suborders" may be 0, so a solution could consist of a strictly rising or strictly falling sequence.
The following algorithm will find any solutions:
Divide the set into two sets, A and B. Empty sets are allowed.
Sort A in rising order and B in falling order.
Concatenate the two sorted sets: AB
Check if you have a solution.
Do this for all possible divisions into A and B.
Expanding on the #roadrunner66 analysis, the solution is to take two smallest elements of the original array, and make them first and last in the target array; take two next smallest elements and make them second and next-to-last; keep going until all the elements are placed into the target. Notice that which one goes to the left, and which one to the right doesn't matter.
Sorting the original array facilitates the process (finding smallest elements becomes trivial), so the time complexity is O(n log n). The space complexity is O(n), because it requires a target array. I don't know off-hand if it is possible to do it in-place.

Checking if two substring overlaps in O(n) time

If I have a string S of length n, and a list of tuples (a,b), where a specifies the staring position of the substring of S and b is the length of the substring. To check if any substring overlaps, we can, for example, mark the position in S whenever it's touched. However, I think this will take O(n^2) time if the list of tuples has a size of n (looping the tuple list, then looping S).
Is it possible to check if any substring actually overlaps with the other in O(n) time?
Edit:
For example, S = "abcde". Tuples = [(1,2),(3,3),(4,2)], representing "ab","cde" and "de". I want to the know an overlap is discovered when (4,2) is read.
I was thinking it is O(n^2) because you get a tuple every time, then you need to loop through the substring in S to see if any character is marked dirty.
Edit 2:
I cannot exit once a collide is detected. Imagine I need to report all the subsequent tuples that collide, so i have to loop through the whole tuple list.
Edit 3:
A high level view of the algorithm:
for each tuple (a,b)
for (int i=a; i <= a+b; i++)
if S[i] is dirty
then report tuple and break //break inner loop only
Your basic approach is correct, but you could optimize your stopping condition, in a way that guarantees bounded complexity in the worst case. Think about it this way - how many positions in S would you have to traverse and mark in the worst case?
If there is no collision, then at worst you'll visit length(S) positions (and run out of tuples by then, since any additional tuple would have to collide). If there is a collision - you can stop at the first marked object, so again you're bounded by the max number of unmarked elements, which is length(S)
EDIT: since you added a requirement to report all colliding tuples, let's calculate this again (extending my comment) -
Once you marked all elements, you can detect collision for every further tuple with a single step (O(1)), and therefore you would need O(n+n) = O(n).
This time, each step would either mark an unmarked element (overall n in the worst case), or identify a colliding tuple (worst O(tuples) which we assume is also n).
The actual steps may be interleaved, since the tuples may be organized in any way without colliding first, but once they do (after at most n tuples which cover all n elements before colliding for the first time), you have to collide every time on the first step. other arrangements may collide earlier even before marking all elements, but again - you're just rearranging the same number of steps.
Worst case example: one tuple covering the entire array, then n-1 tuples (doesn't matter which) -
[(1,n), (n,1), (n-1,1), ...(1,1)]
First tuple would take n steps to mark all elements, the rest would take O(1) each to finish. overall O(2n)=O(n). Now convince yourself that the following example takes the same number of steps -
[(1,n/2-1), (1,1), (2,1), (3,1), (n/2,n/2), (4,1), (5,1) ...(n,1)]
According to your description and comment, the overlap problem may be not about string algorithm, it can be regarded as "segment overlap" problem.
Just use your example, it can be translated to 3 segments: [1, 2], [3, 5], [4, 5]. The question is to check whether the 3 segments have overlap.
Suppose we have m segments each have format [start, end] which means segment start position and end position, one efficient algorithm to detect overlap is to sort them by start position in ascending order, it takes O(m * lgm). Then iterate the sorted m segments, for each segment, try to find whether its end position, here you only need to check:
if(start[i] <= max(end[j], 1 <= j <= i-1) {
segment i is overlap;
}
maxEnd[i] = max(maxEnd[i-1], end[i]); // update max end position of 1 to i
Which each check operation takes O(1). Then the total time complexity is O(m*lgm + m), which can be regarded as O(m*lgm). While for each output, time complexity is related to each tuple's length, which is also related to n.
This is a segment overlap problem and the solution should be possible in O(n) itself if the list of tuples has been sorted in ascending order wrt the first field. Consider the following approach:
Transform the intervals from (start, number of characters) to (start, inclusive_end). Hence the above example becomes: [(1,2),(3,3),(4,2)] ==> [(1, 2), (3, 5), (4, 5)]
The tuples are valid if transformed consecutive tuples (a, b) and (c, d) always follow b < c. Else there is an overlap in the tuples mentioned above.
Each of 1 and 2 can be done in O(n) if the array is sorted in the form mentioned above.

Extracting 2 numbers n times and placing back the addition in O(n) instead of O(n*log(n))

I'm presenting a problem my professor showed in class, with my O(n*log(n)) solution:
Given a list of n numbers we'd like to perform the following n-1 times:
Extract the two minimal elements x,y from the list and present them
Create a new number z , where z = x+y
Put z back into the list
Suggest a data structure and algorithm for O(n*log(n)) , and O(n)
Solution:
We'll use a minimal heap:
Creating the heap one time only would take O(n). After that, extracting the two minimal elements would take O(log(n)). Placing z into the heap would take O(log(n)).
Performing the above n-1 times would take O(n*log(n)), since:
O(n)+O(n∙(logn+logn ))=O(n)+O(n∙logn )=O(n∙logn )
But how can I do it in O(n)?
EDIT:
By saying: "extract the two minimal elements x,y from the list and present them ", I mean printf("%d,%d" , x,y), where x and y are the smallest elements in the current list.
This is not a full answer. But if the list was sorted, then your problem is easiy doable in O(n). To do it, arrange all of the numbers in a linked list. Maintain a pointer to a head, and somewhere in the middle. At each step, take the top two elements off of the head, print them, advance the middle pointer until it is where the sum should go, and insert the sum.
The starting pointer will move close to 2n times and the middle pointer will move about n times, with n inserts. All of those operations are O(1) so the sum total is O(n).
In general you cannot sort in time O(n), but there are a number of special cases in which you can. So in some cases it is doable.
The general case is, of course, not solvable in time O(n). Why not? Because given your output, in time O(n) you can run through the output of the program, build up the list of pairwise sums in order as you go, and filter them out of the output. What is left is the elements of the original list in sorted order. This would give a O(n) general sorting algorithm.
Update: I was asked to show how could you go from the output (10, 11), (12, 13), (14, 15), (21, 25), (29, 46) to the input list? The trick is that you always keep everything in order then you know how to look. With positive integers, the next upcoming sum to use will always be at the start of that list.
Step 0: Start
input_list: (empty)
upcoming sums: (empty)
Step 1: Grab output (10, 11)
input_list: 10, 11
upcoming_sums: 21
Step 2: Grab output (12, 13)
input_list: 10, 11, 12, 13
upcoming_sums: 21, 25
Step 3: Grab output (14, 15)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sums: 21, 25, 29
Step 4: Grab output (21, 25)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 29, 46
Step 5: Grab output (29, 46)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 75
This isn't possible in the general case.
Your problem statement reads that you must reduce your array to a single element, performing a total of n-1 reduction operations. Therefore, the number of reduction operations performed is on the order of O(n). To achieve a overall running time of O(n), each reduction operation must run in O(1).
You have clearly defined your reduction operation:
remove the 2 minimal elements in the array and print them, then
insert the sum of those elements into the array.
If your data structure were a sorted list, it is trivial to remove two minimal elements in O(1) time (pop them off the end of the list). However, reinserting an element in O(1) is not possible (in the general case). As SteveJessop pointed out, if you could insert into a sorted list in O(1) time, the resultant operations would constitute an O(n) sorting algorithm. But there is no such known algorithm.
There are some exceptions here. If your numbers are integers, you may be able to use "radix insert" to achieve O(1) inserts. If your array of numbers are sufficiently sparse in the number line, you may be able to deduce insert points in O(1). There are numerous other exceptions, but they are all exceptions.
This answer doesn't answer your question, per se, but I believe it's relevant enough to warrant an answer.
If the range of values is less than n, then this can be solved in O(n).
1> Create an array mk of size equal to range of values and initialize it to all zero
2> traverse through the array and increment value of mk at the position of the array element.
i.e if the array element is a[i] then increment mk[a[i]]
3) For presenting the answers after each of the n-1 operations follow the following steps:
There are two cases:
Case 1 : all of a[i] are positive
traverse through mk array from 0 to its size
cnt = 0
do this till cnt doesn't equal 2
grab a nonzero element decrease its value by 1 and increment cnt by 1
you can get two minimum values in this way
present them
now do mk[sum of two minimum]++
Case 2 : some of a[i] is negative
<still to update>
O(nlogn) is easy - just use a heap, treap or skiplist.
O(n) sounds tough.
https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/Treap
https://en.wikipedia.org/wiki/Skip_list

efficient sorted Cartesian product of 2 sorted array of integers

Need Hints to design an efficient algorithm that takes the following input and spits out the following output.
Input: two sorted arrays of integers A and B, each of length n
Output: One sorted array that consists of Cartesian product of arrays A and B.
For Example:
Input:
A is 1, 3, 5
B is 4, 8, 10
here n is 3.
Output:
4, 8, 10, 12, 20, 24, 30, 40, 50
Here are my attempts at solving this problem.
1) Given that output is n^2, Efficient algorithm can't do any better than O(n^2) time complexity.
2) First I tried a simple but inefficient approach. Generate Cartesian product of A and B. It can be done in O(n^2) time complexity. we need to store, so we can do sorting on it. Therefore O(n^2) space complexity too. Now we sort n^2 elements which can't be done better than O(n^2logn) without making any assumptions on the input.
Finally I have O(n^2logn) time and O(n^2) space complexity algorithm.
There must be a better algorithm because I've not made use of sorted nature of input arrays.
If there's a solution that's better than O(n² log n) it needs to do more than just exploit the fact that A and B are already sorted. See my answer to this question.
Srikanth wonders how this can be done in O(n) space (not counting the space for the output). This can be done by generating the lists lazily.
Suppose we have A = 6,7,8 and B = 3,4,5. First, multiply every element in A by the first element in B, and store these in a list:
6×3 = 18, 7×3 = 21, 8×3 = 24
Find the smallest element of this list (6×3), output it, replace it with that element in A times the next element in B:
7×3 = 21, 6×4 = 24, 8×3 = 24
Find the new smallest element of this list (7×3), output it, and replace:
6×4 = 24, 8×3 = 24, 7×4 = 28
And so on. We only need O(n) space for this intermediate list, and finding the smallest element at each stage takes O(log n) time if we keep the list in a heap.
If you multiply a value of A with all values of B, the result list is still sorted. In your example:
A is 1, 3, 5
B is 4, 8, 10
1*(4,8,10) = 4,8,10
3*(4,8,10) = 12,24,30
Now you can merge the two lists (exactly like in merge sort). You just look at both list heads and put the smaller one in the result list. so here you would select 4, then 8 then 10 etc.
result = 4,8,10,12,24,30
Now you do the same for result list and the next remaining list merging 4,8,10,12,24,30 with 5*(4,8,10) = 20,40,50.
As merging is most efficient if both lists have the same length, you can modify that schema by dividing A in two parts, do the merging recursively for both parts, and merge both results.
Note that you can save some time using a merge approach as is isn't required that A is sorted, just B needs to be sorted.

Resources