Complexity for finding one of many elements in an array - arrays

The question is pretty much what the title says, with a slight variation. If I remember correctly, finding an entry in an array of size 'n' has as the average case the complexity O(n).
I assume that is also the case if there is a fixed number of elements in the vector, of which we want to find one.
But how is it if the amount of entries, of which we still only try to find one, is in some way related to the size of the vector, i.e. grows in some way with it?
I have such a case at hand, but I don't know the exact relation between array size and number of searched-for entries. Might be linear, might be logarithmically.. Is the average case still O(n)?
I would be grateful for any insights.
edit: an example
array size: 100
array content: at each position, a number of 1-10, completely random which one.
what we seek: the first occurrence of "1"
from a naive point of view, we should on average find an entry after 10 lookups in any kind of linear searches (which we have to do, as the content is not sorted.)
As factors are usually omitted in big-O, does that mean that we still need O(n) in time, even though it should be O(n)

It is O(n) anyway.
Think about finding 1 here:
[9,9,9,9,9,1]

If you're doing a linear search through the array, then the average time complexity of finding one of M elements in an array with N elements will be O(I) where I is the average index of the first of the sought elements. If the array is randomly ordered, then I will be O(N/M) on average, and so the time complexity will also be O(N/M) on average and O(N-M) in the worst case.

I have two minds over this question.
First, if you'll consider an unsorted array (which the case seems here), the asymptotic complexity for average case will be surely O(n).
Let's take an example.
We have n elements in the array or better to say Vector. Now,average case will be searching in a linear fashion by node to node. Which appears to be n/2 in general for average or O(n) as an average case. See,if the elements are added, then the complexity's nature won't change but, the effect is clear,it's n/2 comparisons for average---which is directly 1/2 (half) of n. The effect for m elements now after insertion in array will be O(n-m),or in comparison wise,(n-m)/2 comparisons added as a result to addition of elements in the Vector!
So,we find that with increase in size of array or better to say Vector---the complexity's nature won't change though the no. of comparisons required would be more as it is equal to n/2 in average case.
Second, if the array or vector is sorted, then performing binary-searches will have worst-cases of order log(n+1)---again dependent on n. Also, the average case will increase the comparisons logarithmically,but the complexity order O(log n) won't change!

Related

Find all contiguous subarrays of array in O(n) complexity

I am stuck on finding a solution for finding all the contiguous subarrays of a given array in minimum time complexity O(n).
For example:
[1,2,3,4]
Subarrays are:
[1][2][3][4][1,2][2,3][3,4][1,2,3][2,3,4][1,2,3,4]
I have done it with time complexity O(n^2) but for the large inputs taking lot of time and memory.
Are there any specific algorithms for this problem?
There are exactly n(n+1)/2 subarrays, which can be written as A[i..j] for all i and and all j≥i. The algorithm to generate all pairs is immediate (double loop) and cannot be improved.
If you just need to output the pairs (i, j), space O(1) suffices. If you need to store all pairs, O(n²). And if you need to store all subarrays in full, O(n³); in this case, the time also unavoidably grows to O(n³) and there is another nested loop.
Update:
This answer does not take int account the constraint "the sum of those individual subarray results in perfect square root" in the comments, which was added after the fact and cannot be considered part of the question.

Time complexity of finding kth largest element in n sorted arrays using quickselect

If you have Jsorted arrays of length N, find the kth smallest element among them.
There are a few potential solutions here, some involving a min heap or binary search, but I want to know what the time complexity would be for using quickselect. If we simply concatenated each of the arrays together and used quickselect on the combined array.
Quickselect runs in linear time in the average case, but the combining of arrays does expand the search space, but it is more efficient than using a merging strategy because quickselect necessarily allows some elements to be ignored if you choose good pivots.
Quickselect is very similar to quicksort because of its divide-and-conquer strategy. The difference between the 2 is that Quickselect will only recur for the part of the data that holds the kth
smallest element and will continue until the interval is equal to k value, which means it has found the kth smallest value (https://www.geeksforgeeks.org/quickselect-algorithm/). In regards to your question, I think you are right when you say it depends on the pivot that you get when you are recurring. In the best case it would the time would still take O(n) (because Quickselect only searches 1 part of the split data and the size of n depend on your dataset) and in the worst case O(n^2) for when a bad pivot is selected. In order for this algorithm to be generalized it would be difficult to find the "best possible" pivot for each grouping of data.
When looking at using a heap as the method first when given the data it would take O(n) time to build the heap in the first place. And in order to find the kth smallest item, the min element in the heap, the root, must be found k times. Removing the min in a min heap only takes 1 team, but heapifying the heap after the root is removed would take O(logn) time which could add up to be quite costly depending on this input size you are discussing. So total time for the heap method would be O(1) + O(klogn) where k is the kth smallest number.
Basically the difference in efficiency really depends on how the pivots would be selected in Quickselect and how big the input is or the k value when thinking about the heap method.

How many comparisons does insertion sort do in an already-ordered 2-element array?

The best case scenario of insertion sort is meant to be O(n), however, if you have 2 elements in an array that are already sorted, such as 10 and 11, doesn't it only make one comparison rather than 2?
Time complexity of O(n) does not mean that the number of steps is exactly n, it means that the number of steps is dominated by a linear function. Basically, sorting twice as many elements should take at most twice as much time for large numbers.
The best case scenario for insert sort is when you can insert the new element after just one comparison. This can happen in only 2 cases:
You are inserting elements in from a reverse sorted list and you compare the new element with the first element of the target list.
You are inserting elements from a sorted list and you compare the new element with the last one of the target list.
In these 2 cases, each new element is inserted after just one comparison, including in the case you mention.
The time complexity would be indeed O(n) for these very special cases. You do not need such a favorable case for this complexity, the time complexity will be O(n) if there is a constant upper bound for the number of comparisons independent of the list length.
Note that it is a common optimization to try and handle sorted lists in an optimized way. If the optimization mentioned in the second paragraph above is not implemented, sorting an already sorted list would be the worst case scenario, with n comparisons for the insertion of the n+1th element.
In the general case, insertion sort on lists has a time complexity of O(n2), but careful implementation can produce an optimal solution for already sorted lists.
Note that this is true for lists where inserting at any position has a constant cost, insertion sort on arrays does not have this property. It can still be optimized to handle these special cases, but not both at the same time.
Insertion sort does N - 1 comparisons if the input is already sorted.
This is because for every element it compares it with a previous element and does something if the order is not right (it is not important what it does now, because the order is always right). So you will do this N - 1 times.
So it looks like you have to understand a big O notation. Because O(n) does not mean n operations, it does not even mean close to n operations (n/10^9 is O(n) and it is not really close to n). All it mean that the function approximately linear (think about it as limit where n-> inf).

Efficient way to compute sum of k largest numbers in a list?

I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?
The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)
I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).
One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?

Find the one non-repeating element in array?

I have an array of n elements in which only one element is not repeated, else all the other numbers are repeated >1 times. And there is no limit on the range of the numbers in the array.
Some solutions are:
Making use of hash, but that would result in linear time complexity but very poor space complexity
Sorting the list using MergeSort O(nlogn) and then finding the element which doesn't repeat
Is there a better solution?
One general approach is to implement a bucketing technique (of which hashing is such a technique) to distribute the elements into different "buckets" using their identity (say index) and then find the bucket with the smallest size (1 in your case). This problem, I believe, is also known as the minority element problem. There will be as many buckets as there are unique elements in your set.
Doing this by hashing is problematic because of collisions and how your algorithm might handle that. Certain associative array approaches such as tries and extendable hashing don't seem to apply as they are better suited to strings.
One application of the above is to the Union-Find data structure. Your sets will be the buckets and you'll need to call MakeSet() and Find() for each element in your array for a cost of $O(\alpha(n))$ per call, where $\alpha(n)$ is the extremely slow-growing inverse Ackermann function. You can think of it as being effectively a constant.
You'll have to do Union when an element already exist. With some changes to keep track of the set with minimum cardinality, this solution should work. The time complexity of this solution is $O(n\alpha(n))$.
Your problem also appears to be loosely related to the Element Uniqueness problem.
Try a multi-pass scanning if you have strict space limitation.
Say the input has n elements and you can only hold m elements in your memory. If you use a hash-table approach, in the worst case you need to handle n/2 unique numbers so you want m>n/2. In case you don't have that big m, you can partition n elements to k=(max(input)-min(input))/(2m) groups, and go ahead scan the n input elements k times (in the worst case):
1st run: you only hash-get/put/mark/whatever elements with value < min(input)+m*2; because in the range (min(input), min(input)+m*2) there are at most m unique elements and you can handle that. If you are lucky you already find the unique one, otherwise continue.
2nd run: only operate on elements with value in range (min(input)+m*2, min(input)+m*4), and
so on, so forth
In this way, you compromise the time complexity to a O(kn), but you get a space complexity bound of O(m)
Two ideas come to my mind:
A smoothsort may be a better alternative than the cited mergesort for your needs given it's O(1) in memory usage, O(nlogn) in the worst case as the merge sort but O(n) in the best case;
Based on the (reverse) idea of the splay tree, you could make a type of tree that would
push the leafs toward the bottom once they are used (instead of upward as in the splay tree). This would still give you a O(nlogn) implantation of the sort, but the advantage would be the O(1) step of finding the unique element, it would be the root. The sorting algorithm is the sum of O(nlogn) + O(n) and this algorithm would be O(nlogn) + O(1)
Otherwise, as you stated, using a hash based solution (like hash-implemented set) would result in a O(n) algorithm (O(n) to insert and add a counting reference to it and O(n) to traverse your set to find the unique element) but you seemed to dislike the memory usage, though I don't know why. Memory is cheap, these days...

Resources