Accurate Big-O analysis

Accurate Big-O analysis - arrays

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3,
1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number
(1) is duplicated and another (3) is missing. This routine is often used in simulation of
algorithms. We assume the existence of a random number generator, RandInt(i,j), that
generates between i and j with equal probability. Here are three algorithms:
(i) Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random
numbers until you get one that is not already in A[0], A[1],…, A[i-1].
(ii) Same as algorithm (i), but keep an extra array called the Used array. When a random
number, Ran, is first put in the array A, set Used[Ran] = true. This means that
when filling A[i] with a random number, you can test in one step to see whether the
random number has been used, instead of the (possibly) i steps in the first algorithm.
(iii) Fill the array such that A[i] = i+1. Then
for (i=1; i<n; i++)
swap (A[i], A[RandInt(0,i)]);
Give as accurate (Big-O) an analysis as you can of the expected running time of each algorithm.
Anyone can help with this? Cause i just learn this chapter and not quite understand what the question wants..

I:
You have to fill N slots, but as you keep filling you are less likely to get a valid number.
If you have filled M slots, then the chance of getting a valid number is 1-(M/N). In addition, as the list gets longer, you need to iterate over the entire thing. N numbers * O(N) guesses per slot * O(N) checks to see if number is already contained = O(N^3)
(O(N) checks per number because worst case is last number, 1/N chance to get unused one)
II:Now we don't have to iterate over the entire array for each check, so only O(N^2)
III: Swapping takes constant time, and we iterate over the entire array once, so O(N)
I think I did all those correctly, but I easily could have missed something.
Hope this helps.

Option 4 : Fill a List / Collection with values from 1 to n, then shuffle the Collection. O(n) + ~O(n) => O(n)

Related

How to find the rightmost number in the array which is greater or equal to current one in O(N) time?

given an array nums of integers with length n, for each index i, I am trying to find the rightmost index j such that i < j and nums[j] >= nums[i]. Is there an O(N) solution for this problem? I am aware of monotonic stack which could be used for this kind of problems, but unable to derive an algorithm.
For example, given an array A:
A = [9,8,1,0,1,9,4,0,4,1], the solution should output
[5,5,9,9,9,-1,8,9,-1,-1]. Here -1 indicates no indices satisfy the constraint.
This link asked the same question, and the accepted answer is only for O(NlogN). I'd like to know whether an O(N) solution is possible.
Thank you.
Update
Based on #Aivean's answer, here is an O(Nlog(N)) solution in python.
def rightmostGreaterOrEqual(nums):
A, n = nums, len(nums)
indx = [-1]*n
stack, stackv = [], []
for i in range(n-1, -1, -1):
if not stack or nums[stack[-1]] < nums[i]:
stack.append(i)
stackv.append(nums[i])
else:
idx = bisect.bisect_left(stackv, nums[i])
indx[i] = stack[idx]
return indx
B = [9,8,1,0,1,9,4,0,4,1]
rightGreat = rightmostGreaterOrEqual(B)
print(B)
[9, 8, 1, 0, 1, 9, 4, 0, 4, 1]
print(rightGreat)
[5, 5, 9, 9, 9, -1, 8, 9, -1, -1]

There is not going to be an O(N) algorithm for the problem as written. Given a function that solves this problem, you could use it to partition N/2 arbitrary numbers into N/2 arbitrary adjacent ranges.
For example [2532,1463,3264,200,4000,3000,2000,1000] produces [5,6,4,7,-1,-1,-1,-1], identifying the ranges of the first N/2 numbers.
If you can only relate the numbers by comparison, then this will take you N/2 * log(N/2) comparisons, so O(N log N) time.
Without a limit on the size of the numbers, which would let you cheat like a radix sort, there isn't going to be way that is asymptotically faster than all comparison-based methods.

The two problems of finding leftmost and rightmost j for a given i are not symmetrical, because of the added constraint of i < j. When I'm talking about these two tasks I assume that the constraint i < j is not flipped. This constraint means, that we always look to the right of i when searching for j, whether we're looking for rightmost or leftmost j. Without this constraint two tasks would be symmetrical.
1. Finding rightmost j, such that i < j and nums[i] ≤ nums[j]
One way to solve this task, is to traverse nums from right to left and maintain the strictly increasing subsequence of already visited elements (with their indices). Current element is added to the sequence only if it's larger, than the largest element already present in the sequence. Adding new element into the sequence is O(1).
For each element of nums you have to perform binary search in the subsequence of the visited elements to find the value that is larger or equals than the current element. Binary search is O(log n).
The total time is O(n log n), the auxiliary space needed is O(n).
Here is the graphical representation of the problem:
Here yellow dots represent the elements that form strictly increasing sequence (and their answer will be -1). Every other element (in blue) picks one of the ranges formed by yellow elements.
2. Finding leftmost j, such that i < j and nums[i] ≤ nums[j]
This problem, as opposed to the previous one, can be solved in O(n) time and O(n) space using monotonic stack. Similar to the previous problem, as you traverse nums from right to left, you form and maintain a monotonic stack, but, importantly, when new element is added to the stack all elements that are smaller are removed from the stack. And instead of using binary search to find the larger element, the answer is right at the new top of the stack (after all smaller elements were removed). This makes updating the stack and finding the answer for each element amortized O(1).
Here yellow elements represent elements with answer = -1, when they were added to the stack, they emptied the stack completely, as they were larger than every other element in the stack.

Complexity of a algorithm thats iterates based on the max number in an array?

so I was doing a competitive programming challenge a friend proposed to me, and I was able to figure it out, but after that, I decided to have a bit of fun with other ways of solving it.
Thing is, I did find a weird way of doing it, but we can't really wrap our head around the complexity, because we think that it might not have a formal complexity?
A brief explanation of the problem is this:
You have an array ( might not be ordered ) of integers, all positive and bigger than 0.
Your task is to tell me the first integer that is missing from the array, so for example:
[1, 4, 5,3, 2, 6, 10] ==> The answer is 7
My fun way of doing it was:
You iterate through the array once, and put all the values in a HashMap, saving the max value in the array in a separate variable.
After that, you create a for loop that goes from 1 to the max value of the array.
Then you check if the for loop index exists in the hashmap, if it doesn't then you have found your answer.
The thing is, this would be O(n), but the for loop is screwing with me a bit.. Is it O(n)?
It feel wrong to say it O(n).
Imagine this pseudo-code:
let numArray = {1, 2, 3, 1000};
for i=1 TO max(numArray){
print "Hello there buddy"
}
What would the complexity of this be? Does it even have a formal complexity?
From my understanding, saying it has a complexity doesn't really follow the purpose of the Big O Notation, since the it's purpose is evaluating the time it's gonna take for some code to run, has the SIZE of the input changes, and here it doesn't matter the size of the input, just the value of the max number... If the array has 3 elements, and the max is 1 billion, it's still gonna take 1 billion iterations...
So, is there a specific notation for this?
How would you describe this problem?

In such cases, we assume a new variable m and say that this solution is of O(m), where m will be the highest number in the array; rather than saying O(n).
Just an additional input, another better way to solve this problem is below:
assume
a = [1, 4, 5,3, 2, 6, 10]
for i in 0->n:
if (a[i] < n)
a[a[i]] = -a[a[i]];
for i in 0->n:
if (a[i] > 0)
print (i);
Basically the idea here is to mark visited index by reversing its sign to -ve.
While re-iterating first found positive element's index will be the missing item.

Find the longest subarray that contains a majority element

I am trying to solve this algorithmic problem:
https://dunjudge.me/analysis/problems/469/
For convenience, I have summarized the problem statement below.
Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the
longest subarray that contains a majority element.
A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.
Time limit: 1.5s
For example:
If the given array is [1, 2, 1, 2, 3, 2],
The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).
My attempt:
The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.
I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.
Using the example above, these buckets would be:
1: 0, 2
2: 1, 3, 5
3: 4
Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket.
We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.
Could someone please advise me on a better approach to solve this problem?
Edit:
I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).

Note: attempt 1 is wrong as #hk6279 has given a counter example. Thanks for pointing it out.
Attempt 1:
The answer is quite complex, so I will discuss a brief idea
Let process each unique number one by one.
Processing each occurrence of number x from left to right, at index i, let add an segment (i, i) indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i), (So, if the left is (st, ed), we try to make it become (st, i) if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.
We keep all those segments in a stack for faster look up/add/remove.
Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.
Time complexity should be O(n) as each element could only be merged once.
Attempt 2:
Let process each unique number one by one
For each unique number x, we maintain an array of counter. From 0 to end of the array, if we encounter a value x we increase the count, and if we don't we decrease, so for this array
[0,1,2,0,0,3,4,5,0,0] and number 0, we have this array counter
[1,0,-1,0,1,0,-1,-2,-1,0]
So, in order to make a valid subarray which ends at a specific index i, the value of counter[i] - counter[start - 1] must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise; and the problem can be converted into finding the subarray with sum is positive)
So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))
To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x, we saw that we can store for above array counter:
[1,#,#,0,1,#,#,#,-1,0]
This will leads to O (n log n) solution, which only go through each element once.

This elaborate and explain how attempt 2 in #PhamTrung solution is working
To get the length of longest subarray. We should
Find the max. number of majority element in a valid array, denote as m
This is done by attempt 2 in #PhamTrung solution
Return min( 2*m-1, length of given array)
Concept
The attempt is stem from a method to solve longest positive subarray
We maintain an array of counter for each unique number x. We do a +1 when we encounter x. Otherwise, do a -1.
Take array [0,1,2,0,0,3,4,5,0,0,1,0] and unique number 0, we have array counter [1,0,-1,0,1,0,-1,-2,-1,0,-1,0]. If we blind those are not target unique number, we get [1,#,#,0,1,#,#,#,-1,0,#,0].
We can get valid array from the blinded counter array when there exist two counter such that the value of the right counter is greater than or equal to the left one. See Proof part.
To further improve it, we can ignore all # as they are useless and we get [1(0),0(3),1(4),-1(8),0(9),0(11)] in count(index) format.
We can further improve this by not record counter that is greater than its previous effective counter. Take counter of index 8,9 as an example, if you can form subarray with index 9, then you must be able to form subarray with index 8. So, we only need [1(0),0(3),-1(8)] for computation.
You can form valid subarray with current index with all previous index using binary search on counter array by looking for closest value that is less than or equal to current counter value (if found)
Proof
When right counter greater than left counter by r for a particular x, where k,r >=0 , there must be k+r number of x and k number of non x exist after left counter. Thus
The two counter is at index position i and r+2k+i
The subarray form between [i, r+2k+i] has exactly k+r+1 number of x
The subarray length is 2k+r+1
The subarray is valid as (2k+r+1) <= 2 * (k+r+1) -1
Procedure
Let m = 1
Loop the array from left to right
For each index pi
If the number is first encounter,
Create a new counter array [1(pi)]
Create a new index record storing current index value (pi) and counter value (1)
Otherwise, reuse the counter array and index array of the number and perform
Calculate current counter value ci by cprev+2-(pi - pprev), where cprev,pprev are counter value and index value in index record
Perform binary search to find the longest subarray that can be formed with current index position and all previous index position. i.e. Find the closest c, cclosest, in counter array where c<=ci. If not found, jump to step 5
Calculate number of x in the subarray found in step 2
r = ci - cclosest
k = (pi-pclosest-r)/2
number of x = k+r+1
Update counter m by number of x if subarray has number of x > m
Update counter array by append current counter if counter value less than last recorded counter value
Update index record by current index (pi) and counter value (ci)

For completeness, here's an outline of an O(n) theory. Consider the following, where * are characters different from c:
* c * * c * * c c c
i: 0 1 2 3 4 5 6 7 8 9
A plot for adding 1 for c and subtracting 1 for a character other than c could look like:
sum_sequence
0 c c
-1 * * c c
-2 * * c
-3 *
A plot for the minimum of the above sum sequence, seen for c, could look like:
min_sum
0 c * *
-1 * c * *
-2 c c c
Clearly, for each occurrence of c, we are looking for the leftmost occurrence of c with sum_sequence lower than or equal to the current sum_sequence. A non-negative difference would mean c is a majority, and leftmost guarantees the interval is the longest up to our position. (We can extrapolate a maximal length that is bounded by characters other than c from the inner bounds of c as the former can be flexible without affecting the majority.)
Observe that from one occurrence of c to the next, its sum_sequence can decrease by an arbitrary size. However, it can only ever increase by 1 between two consecutive occurrences of c. Rather than each value of min_sum for c, we can record linear segments, marked by cs occurrences. A visual example:
[start_min
\
\
\
\
end_min, start_min
\
\
end_min]
We iterate over occurrences of c and maintain a pointer to the optimal segment of min_sum. Clearly we can derive the next sum_sequence value for c from the previous one since it is exactly diminished by the number of characters in between.
An increase in sum_sequence for c corresponds with a shift of 1 back or no change in the pointer to the optimal min_sum segment. If there is no change in the pointer, we hash the current sum_sequence value as a key to the current pointer value. There can be O(num_occurrences_of_c) such hash keys.
With an arbitrary decrease in c's sum_sequence value, either (1) sum_sequence is lower than the lowest min_sum segment recorded so we add a new, lower segment and update the pointer, or (2) we've seen this exact sum_sequence value before (since all increases are by 1 only) and can use our hash to retrieve the optimal min_sum segment in O(1).
As Matt Timmermans pointed out in the question comments, if we were just to continually update the pointer to the optimal min_sum by iterating over the list, we would still only perform O(1) amortized-time iterations per character occurrence. We see that for each increasing segment of sum_sequence, we can update the pointer in O(1). If we used binary search only for the descents, we would add at most (log k) iterations for every k occurences (this assumes we jump down all the way), which keeps our overall time at O(n).

Algorithm :
Essentially, what Boyer-Moore does is look for a suffix sufsuf of nums where suf[0]suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else. Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]
Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]
Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).
Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.
Here's Java Solution :
Time complexity : O(n)
Space complexity : O(1)
public int majorityElement(int[] nums) {
int count = 0;
Integer candidate = null;
for (int num : nums) {
if (count == 0) {
candidate = num;
}
count += (num == candidate) ? 1 : -1;
}
return candidate;
}

Largest 3 numbers c language [duplicate]

I have an array
A[4]={4,5,9,1}
I need it would give the first 3 top elements like 9,5,4
I know how to find the max element but how to find the 2nd and 3rd max?
i.e if
max=A[0]
for(i=1;i<4;i++)
{
if (A[i]>max)
{
max=A[i];
location=i+1;
}
}
actually sorting will not be suitable for my application because,
the position number is also important for me i.e. I have to know in which positions the first 3 maximum is occurring, here it is in 0th,1th and 2nd position...so I am thinking of a logic
that after getting the max value if I could put 0 at that location and could apply the same steps for that new array i.e.{4,5,0,1}
But I am bit confused how to put my logic in code

Consider using the technique employed in the Python standard library. It uses an underlying heap data structure:
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
for elem in it:
heappushpop(result, elem)
result.sort(reverse=True)
return result
The steps are:
Make an n length fixed array to hold the results.
Populate the array with the first n elements of the input.
Transform the array into a minheap.
Loop over remaining inputs, replacing the top element of the heap if new data element is larger.
If needed, sort the final n elements.
The heap approach is memory efficient (not requiring more memory than the target output) and typically has a very low number of comparisons (see this comparative analysis).

You can use the selection algorithm
Also to mention that the complexity will be O(n) ie, O(n) for selection and O(n) for iterating, so the total is also O(n)

What your essentially asking is equivalent to sorting your array in descending order. The fastest way to do this is using heapsort or quicksort depending on the size of your array.
Once your array is sorted your largest number will be at index 0, your second largest will be at index 1, ...., in general your nth largest will be at index n-1

you can follw this procedure,
1. Add the n elements to another array B[n];
2. Sort the array B[n]
3. Then for each element in A[n...m] check,
A[k]>B[0]
if so then number A[k] is among n large elements so,
search for proper position for A[k] in B[n] and replace and move the numbers on left in B[n] so that B[n] contains n large elements.
4. Repeat this for all elements in A[m].
At the end B[n] will have the n largest elements.

Algorithm Olympiad : conditional minimum in array

I have an array A = [a1, a2, a3, a4, a5...] and I want to find two elements of the array, say A[i] and A[j] such that i is less than j and A[j]-A[i] is minimal and positive.
The runtime has to be O(nlog(n)).
Would this code do the job:
First sort the array and keep track of the original index of each element (ie : the index of the element in the ORIGINAL (unsorted) array.
Go through the sorted array and calculate the differences between any two successive elements that verify the initial condition that the Original Index of the bigger element is bigger than the original index of the smaller element.
The answer would be the minimum value of all these differences.
Here is how this would work on an example:
A = [0, -5, 10, 1]
In this case the result should be 1 coming from the difference between A[3] and A[0].
sort A : newA=[-5,0,1,10]
since OriginalIndex(-5)>OriginalIndex(0), do not compute the difference
since OriginalIndex(1)>OriginalIndex(0), we compute the difference = 1
since OriginalIndex(10)>OriginalIndex(1), we compute the difference = 9
The result is the minimal difference, which is 1.

Contrary to the claim made in the other post there wouldn't be any problem regarding the runtime of your algorithm. Using heapsort for example the array could be sorted in O(n log n) as given as an upper bound in your question. An additional O (n) running once along the sorted array couldn't harm this any more, so you would still stay with runtime O (n log n).
Unfortunately your answer still doesn't seem to be correct as it doesn't give the correct result.
Taking a closer look at the example given you should be able to verify that yourself. The array given in your example was: A=[0,-5,10,1]
Counting from 0 choosing indices i=2 and j=3 meets the given requirement i < j as 2 < 3. Calculating the difference A[j] - A[i] which with the chosen values comes down to A[3] - A[2] calculates to 1 - 10 = -9 which is surely less than the minimal value of 1 calculated in the example application of your algorithm.

Since you're minimising the distance between elements, they must be next to each other in the sorted list (if they weren't then the element in between would be a shorter distance to one of them -> contradiction). Your algorithm runs in O(nlogn) as specified so it looks fine to me.