Find all contiguous subarrays of array in O(n) complexity - arrays

I am stuck on finding a solution for finding all the contiguous subarrays of a given array in minimum time complexity O(n).
For example:
[1,2,3,4]
Subarrays are:
[1][2][3][4][1,2][2,3][3,4][1,2,3][2,3,4][1,2,3,4]
I have done it with time complexity O(n^2) but for the large inputs taking lot of time and memory.
Are there any specific algorithms for this problem?

There are exactly n(n+1)/2 subarrays, which can be written as A[i..j] for all i and and all j≥i. The algorithm to generate all pairs is immediate (double loop) and cannot be improved.
If you just need to output the pairs (i, j), space O(1) suffices. If you need to store all pairs, O(n²). And if you need to store all subarrays in full, O(n³); in this case, the time also unavoidably grows to O(n³) and there is another nested loop.
Update:
This answer does not take int account the constraint "the sum of those individual subarray results in perfect square root" in the comments, which was added after the fact and cannot be considered part of the question.

Related

Understanding the Big O for squaring elements in an array

I was working on a problem where you have to square the numbers in a sorted array on leetcode. Here is the original problem
Given an array of integers A sorted in non-decreasing order, return an array of the squares of each number, also in sorted non-decreasing order.
I am trying to understand the big O for my code and for the code that was given in the solution.
This is my code
def sortedSquare(A):
new_A = []
for num in A:
num = num*num
new_A.append(num)
return sorted(new_A)
print(sortedSquare([-4, -1, 0, 3, 10]))
Here is the code from the solution:
def sortedSquares(self, A):
return sorted(x*x for x in A)
For the solution, the Big O is
NlogN
Where N is the length of the array. I don't understand why it would be logN and not just N for the Big O?
For my solution, I am seeing it as Big O of N because I am just iterating through the entire array.
Also, is my solution a good solution compared to the solution that was given?
Your solution does the exact same thing as the given solution. Both solutions square all the elements and then sort the resultant array, with the leetcode solution being a bit more concise.
The reason why both these solutions are O(NlogN) is because of the use of sorted(). Python's builtin sort is timsort which sorts the array in O(NlogN) time. The use of sorted(), not squaring, is what provides the dominant factor in your time complexity (O(NlogN) + O(N) = O(NlogN)).
Note though that this problem can be solved in O(N) using two pointers or by using the merge step in mergesort.
Edit:
David Eisenstat brought up a very good point on timsort. Timsort aggregates strictly increasing and strictly decreasing runs and merges them. Since the resultant squared array will be first strictly decreasing and then strictly increasing, timsort will actually reverse the strictly decreasing run and then merge them in O(N).
The way complexity works is that the overall complexity for the whole program is the worst complexity for any one part. So, in your case, you have the part that squares the numbers and you have the part that sorts the numbers. So which part is the one that determines the overall complexity?
The squaring part is o(n) because you only touch the elements once in order to square them.
What about the sorting part? Generally it depends on what sorting function you use:
Most sort routines have O(n*log(n)) because they use a divide and conquer algorithm.
Some (like bubble sort) have O(n^2)
Some (like the counting sort) have O(n)
In your case, they say that the given solution is O(n*log(n)) and since the squaring part is O(n) then the sorting part must be O(n*log(n)). And since your code uses the same sorting function as the given solution your sort must also be O(n*log(n))
So your squaring part is O(n) and your sorting part is O(n*log(n)) and the overall complexity is the worst of those: O(n*log(n))
If extra storage space is allowed (like in your solution), the whole process can be performed in time O(N). The initial array is already sorted. You can split it in two subsequences with the negative and positive values.
Square all elements (O(N)) and reverse the negative subsequence (O(N) at worse), so that both sequences are sorted. If one of the subsequences is empty, you are done.
Otherwise, merge the two sequences, in time O(N) (this is the step that uses extra O(N) space).

Time complexity of finding kth largest element in n sorted arrays using quickselect

If you have Jsorted arrays of length N, find the kth smallest element among them.
There are a few potential solutions here, some involving a min heap or binary search, but I want to know what the time complexity would be for using quickselect. If we simply concatenated each of the arrays together and used quickselect on the combined array.
Quickselect runs in linear time in the average case, but the combining of arrays does expand the search space, but it is more efficient than using a merging strategy because quickselect necessarily allows some elements to be ignored if you choose good pivots.
Quickselect is very similar to quicksort because of its divide-and-conquer strategy. The difference between the 2 is that Quickselect will only recur for the part of the data that holds the kth
smallest element and will continue until the interval is equal to k value, which means it has found the kth smallest value (https://www.geeksforgeeks.org/quickselect-algorithm/). In regards to your question, I think you are right when you say it depends on the pivot that you get when you are recurring. In the best case it would the time would still take O(n) (because Quickselect only searches 1 part of the split data and the size of n depend on your dataset) and in the worst case O(n^2) for when a bad pivot is selected. In order for this algorithm to be generalized it would be difficult to find the "best possible" pivot for each grouping of data.
When looking at using a heap as the method first when given the data it would take O(n) time to build the heap in the first place. And in order to find the kth smallest item, the min element in the heap, the root, must be found k times. Removing the min in a min heap only takes 1 team, but heapifying the heap after the root is removed would take O(logn) time which could add up to be quite costly depending on this input size you are discussing. So total time for the heap method would be O(1) + O(klogn) where k is the kth smallest number.
Basically the difference in efficiency really depends on how the pivots would be selected in Quickselect and how big the input is or the k value when thinking about the heap method.

Efficient way to compute sum of k largest numbers in a list?

I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?
The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)
I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).
One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?

Searching algorithm with complexity O(log n), UNSORTED list/array

I had this exercice in an exam which stated:
Find an algorithm which can search for the highest number in an
unsorted list and have a Big-Oh complexity of O(log(N)).
The only searching algorithm with a log n complexity that I have found is the binary search algorithm but that one requires my list/array to be sorted.
Is there such an algorithm?
This is a trick question. It has not been stated that the list has N elements. So, you can use a change of variable, and replace N with 2K. Now, solve the problem with a linear algorithm on a list with K elements.
If we assume there are N elements in the list, a possible solution would be to use N parallel computing elements [ CE0 .. CEN ]. In the base case of the algorithm, we let each computing element CEi in [ CEN/2 .. CEN ] compare list values x2i-N and x2i-N+1. Each computing element reports the larger of their two assigned values to CEi/2. The iterative steps of the algorithm is that each computing element CEk that receives two reported values reports the largest to CEk/2. This iterative logic continues until CE0 processes a report from itself. Instead of reporting to itself again, it outputs the result.
If parallel computation is ruled out, then there is no solution to the problem.
No, there is no such algorithms. In a unsorted list, find a highest number require to browse through all elements.
So, no algorithm better than O(n) exists!
The best one can do is O(n) time in an unsorted array.
But instead of simply looking through the whole list you can apply a partition() routine (from the quicksort algorithm) and instead of recursing on the lower half of the partition you can recurse on the upper half and keep partitioning until the largest element is found. This takes O(n) time.
Check out for detailed explanation:
http://en.wikipedia.org/wiki/Quickselect
How to find the kth largest element in an unsorted array of length n in O(n)?
Hope it helped! :)

Complexity for finding one of many elements in an array

The question is pretty much what the title says, with a slight variation. If I remember correctly, finding an entry in an array of size 'n' has as the average case the complexity O(n).
I assume that is also the case if there is a fixed number of elements in the vector, of which we want to find one.
But how is it if the amount of entries, of which we still only try to find one, is in some way related to the size of the vector, i.e. grows in some way with it?
I have such a case at hand, but I don't know the exact relation between array size and number of searched-for entries. Might be linear, might be logarithmically.. Is the average case still O(n)?
I would be grateful for any insights.
edit: an example
array size: 100
array content: at each position, a number of 1-10, completely random which one.
what we seek: the first occurrence of "1"
from a naive point of view, we should on average find an entry after 10 lookups in any kind of linear searches (which we have to do, as the content is not sorted.)
As factors are usually omitted in big-O, does that mean that we still need O(n) in time, even though it should be O(n)
It is O(n) anyway.
Think about finding 1 here:
[9,9,9,9,9,1]
If you're doing a linear search through the array, then the average time complexity of finding one of M elements in an array with N elements will be O(I) where I is the average index of the first of the sought elements. If the array is randomly ordered, then I will be O(N/M) on average, and so the time complexity will also be O(N/M) on average and O(N-M) in the worst case.
I have two minds over this question.
First, if you'll consider an unsorted array (which the case seems here), the asymptotic complexity for average case will be surely O(n).
Let's take an example.
We have n elements in the array or better to say Vector. Now,average case will be searching in a linear fashion by node to node. Which appears to be n/2 in general for average or O(n) as an average case. See,if the elements are added, then the complexity's nature won't change but, the effect is clear,it's n/2 comparisons for average---which is directly 1/2 (half) of n. The effect for m elements now after insertion in array will be O(n-m),or in comparison wise,(n-m)/2 comparisons added as a result to addition of elements in the Vector!
So,we find that with increase in size of array or better to say Vector---the complexity's nature won't change though the no. of comparisons required would be more as it is equal to n/2 in average case.
Second, if the array or vector is sorted, then performing binary-searches will have worst-cases of order log(n+1)---again dependent on n. Also, the average case will increase the comparisons logarithmically,but the complexity order O(log n) won't change!

Resources