Find the maximum sum out of two sorted arrays - arrays

There are two sorted arrays A and B of equal length where A is sorted in ascending order and array B is sorted in descending order.
A = {1, 3, 7, 7, 7, 7}
B = {7, 7, 5, 5, 5, 4}
My task is to find two elements, one from A and other from B, so that their sum is maximum.
There is a constraint that I can choose any element from A but I have to choose the element from B in such an order that the index of the element of array B should be greater than that of the chosen element of A.
In this case, the maximum sum that can be chosen is 12. I have done it in O(n) by simply iterating from left to right.
I want to know whether there exists any better and more efficient way to find the sum of those two elements.

We know that the best sum is the largest value among the sequence C obtained by adding A and B pairwise.
A and B are monotonic, but C can be arbitrary, so there is no shortcut, you need to compute the whole of C, and O(N) is optimal.

As Yves Daoust points out, in principle, O(n) is optimal, but you can do some simple tricks, to save time in practice.
You can make usage of the maximum value of A.
const int maxA = A[sizeA-1];
In your loop, you can check the following two conditions to savely abort your loop searching for the maximum ( i is your loop variable).
// since B will be smaller the next time and A is at its max, no need to go further
if ( A[i] >= maxA ) break;
// if there is no chance left, to find a greater sum, since even maxA with the biggest
// B we are about to see is not big enough, break.
if ( maxA + B[i] <= currentMax ) break;

Related

Increment two array elements at a time so all equal the max value

Given any array of natural numbers, for eg: [2, 1, 2, 3]
Find if array can be converted into Max array (print- "YES") or if not (print - "NO")
to make it Max array - convert every element of array equal to its maximum element. In above eg it will be [3, 3, 3, 3] but by following these rules -
Increment any two elements by 1 at a time (exactly 2 elements. you cannot increment one or more than two elements at a time)
Do this multiple times until you convert every element equal to max elements (print "YES" if possible else "NO")
Sample input:
[2, 1, 2, 3]
Expected Output:
"YES"
Explanation:
Step 1: increment first and second element by 1 -
[3, 2, 2, 3]
Step 2: increment second and third element by 1 -
[3, 3, 3, 3]
Can anyone point to the solution - any link, similar question, pattern or solution? Thank you
Edit:
I tried this approach to solve it -
Find max value and remove it
Find duplicate pair of each number and after that for remaining single numbers
there should be equal number of even and odd numbers
But can't quite get the correct result.
This is actually a known interview/programming contest question, but it's usually presented as "Given an array of positive integers, can you reduce them all to zero, two (or k) at a time?"
There is a simple solution: we only need to check whether we can reach the desired sum in steps of two (i.e. check parity), and whether the smallest number can reach the maximum by the time all other numbers have reached the maximum.
def is_possible(nums: List[int]) -> bool:
smallest, largest = min(nums), max(nums)
total_needed = sum(largest - x for x in nums)
if total_needed % 2 == 1:
return False
return 2 * (largest - smallest) <= total_needed
this gives:
assert is_possible([6, 6, 10]) == True
assert is_possible([2, 1, 2, 3]) == True
assert is_possible([1, 5, 5, 9]) == True
assert is_possible([1, 2, 9]) == False
assert is_possible([1, 4, 9, 10]) == False
assert is_possible([1, 6, 6, 9]) == False
A more specific problem statement
One unfortunate feature of this problem is that despite the intuitively simple solution, a full proof of this solution is rather long.
The original statement of the problem has caused confusion over the meaning of the phrase 'max array', so I'll try to give a precise mathematical description of the problem, and then transform that. This will then explain why the code is implementing the natural 'greedy strategy' for the problem, and why that works.
Original Problem: Given a zero-indexed array of positive integers A of length n > 1, you are allowed to perform the following operation any number of times: Choose two distinct indices i, j with 0 <= i < j < n, such that A[i] < max(A) and A[j] < max(A), and increment A[i] and A[j]. Determine whether you can make all of the array elements equal.
The greedy strategy
The 'greedy' or brute-force solution to this problem, if performance wasn't a concern, would be to select the two smallest elements from A and increment them, repeating this until either all or all but one element from A was equal to max(A). If exactly one element isn't equal to max(A), we failed and the task is impossible (this statement requires a proof); otherwise it is clearly possible.
def is_possible_brute_force(nums: List[int]) -> bool:
largest = max(nums)
nums.sort()
while nums[0] != largest:
first = nums.pop(0)
second = nums.pop(0)
if second == largest and first != largest: # If exactly one number not max
return False
bisect.insort(nums, first+1)
bisect.insort(nums, second+1)
return all(x == largest for x in nums) # Always true
Our goal is to simulate the result of this procedure, without actually doing it. We can observe immediately that the task is impossible if the sum of gaps between elements of A and max(A), which we might call total_needed, is odd. It's also true that we can apply the following transformation to the problem without changing the answer:
New Problem: Let M = max(A). Let B be A after the transform A[i] -> M - A[i]. Our allowed operation is now to decrement two distinct indices of B, and our goal is to reach the zero array.
It's easier to think in terms of B and decrements. The first strategy you might think of is: repeatedly decrement the two largest elements of B, i.e. the greedy strategy. This strategy turns out to be optimal, finding a solution whenever it exists.
Let Max_B = max(B) and let Sum_B = sum(B). Since we know that no solution exists if Sum_B is odd, we can assume that Sum_B is even from here on. There are two possibilities:
Max_B > Sum_B - Max_B. In this case, no matter what we do, after performing Sum_B - Max_B decrements, all elements except Max_B are zero, so no solution is possible.
Max_B <= Sum_B - Max_B. In this case, a solution is always possible.
To prove (2), it suffices to prove two things:
i. If Max_B <= Sum_B - Max_B, then after decrementing the two largest elements, we still have Max_B <= Sum_B - Max_B for our new array.
ii. The only configuration where no moves are possible yet B is nonzero is when exactly one element of B is nonzero; in this case, Max_B > Sum_B - Max_B
The proof of the first statement is algebraic manipulation and case analysis that is fairly unsurprising, so I'll omit that from this already lengthy proof. The first Python code snippet can now be understood as checking the parity of total_needed, and whether we are in situation (1) or (2) above.
Edit: The original posted version of the code had a mistake in the final line, using an incorrect variable name and a flipped inequality sign, compared to the equation in the explanation and proof. Credit and thanks goes to user Breaking Not So Bad for catching this.
Here is a simple algorithm that should work. The idea is to increment the lowest values first:
Find the maximum value. Let's call it max.
Find the minimum value. Let's call it min. If min = max, output YES.
Find an element with value min and increment it.
Find the minimum value of the other elements. Let's call it min. If min = max, output NO.
Find an element (other than the previous one) with value min and increment it.
Go to step 2.

Find the longest subarray that contains a majority element

I am trying to solve this algorithmic problem:
https://dunjudge.me/analysis/problems/469/
For convenience, I have summarized the problem statement below.
Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the
longest subarray that contains a majority element.
A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.
Time limit: 1.5s
For example:
If the given array is [1, 2, 1, 2, 3, 2],
The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).
My attempt:
The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.
I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.
Using the example above, these buckets would be:
1: 0, 2
2: 1, 3, 5
3: 4
Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket.
We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.
Could someone please advise me on a better approach to solve this problem?
Edit:
I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).
Note: attempt 1 is wrong as #hk6279 has given a counter example. Thanks for pointing it out.
Attempt 1:
The answer is quite complex, so I will discuss a brief idea
Let process each unique number one by one.
Processing each occurrence of number x from left to right, at index i, let add an segment (i, i) indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i), (So, if the left is (st, ed), we try to make it become (st, i) if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.
We keep all those segments in a stack for faster look up/add/remove.
Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.
Time complexity should be O(n) as each element could only be merged once.
Attempt 2:
Let process each unique number one by one
For each unique number x, we maintain an array of counter. From 0 to end of the array, if we encounter a value x we increase the count, and if we don't we decrease, so for this array
[0,1,2,0,0,3,4,5,0,0] and number 0, we have this array counter
[1,0,-1,0,1,0,-1,-2,-1,0]
So, in order to make a valid subarray which ends at a specific index i, the value of counter[i] - counter[start - 1] must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise; and the problem can be converted into finding the subarray with sum is positive)
So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))
To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x, we saw that we can store for above array counter:
[1,#,#,0,1,#,#,#,-1,0]
This will leads to O (n log n) solution, which only go through each element once.
This elaborate and explain how attempt 2 in #PhamTrung solution is working
To get the length of longest subarray. We should
Find the max. number of majority element in a valid array, denote as m
This is done by attempt 2 in #PhamTrung solution
Return min( 2*m-1, length of given array)
Concept
The attempt is stem from a method to solve longest positive subarray
We maintain an array of counter for each unique number x. We do a +1 when we encounter x. Otherwise, do a -1.
Take array [0,1,2,0,0,3,4,5,0,0,1,0] and unique number 0, we have array counter [1,0,-1,0,1,0,-1,-2,-1,0,-1,0]. If we blind those are not target unique number, we get [1,#,#,0,1,#,#,#,-1,0,#,0].
We can get valid array from the blinded counter array when there exist two counter such that the value of the right counter is greater than or equal to the left one. See Proof part.
To further improve it, we can ignore all # as they are useless and we get [1(0),0(3),1(4),-1(8),0(9),0(11)] in count(index) format.
We can further improve this by not record counter that is greater than its previous effective counter. Take counter of index 8,9 as an example, if you can form subarray with index 9, then you must be able to form subarray with index 8. So, we only need [1(0),0(3),-1(8)] for computation.
You can form valid subarray with current index with all previous index using binary search on counter array by looking for closest value that is less than or equal to current counter value (if found)
Proof
When right counter greater than left counter by r for a particular x, where k,r >=0 , there must be k+r number of x and k number of non x exist after left counter. Thus
The two counter is at index position i and r+2k+i
The subarray form between [i, r+2k+i] has exactly k+r+1 number of x
The subarray length is 2k+r+1
The subarray is valid as (2k+r+1) <= 2 * (k+r+1) -1
Procedure
Let m = 1
Loop the array from left to right
For each index pi
If the number is first encounter,
Create a new counter array [1(pi)]
Create a new index record storing current index value (pi) and counter value (1)
Otherwise, reuse the counter array and index array of the number and perform
Calculate current counter value ci by cprev+2-(pi - pprev), where cprev,pprev are counter value and index value in index record
Perform binary search to find the longest subarray that can be formed with current index position and all previous index position. i.e. Find the closest c, cclosest, in counter array where c<=ci. If not found, jump to step 5
Calculate number of x in the subarray found in step 2
r = ci - cclosest
k = (pi-pclosest-r)/2
number of x = k+r+1
Update counter m by number of x if subarray has number of x > m
Update counter array by append current counter if counter value less than last recorded counter value
Update index record by current index (pi) and counter value (ci)
For completeness, here's an outline of an O(n) theory. Consider the following, where * are characters different from c:
* c * * c * * c c c
i: 0 1 2 3 4 5 6 7 8 9
A plot for adding 1 for c and subtracting 1 for a character other than c could look like:
sum_sequence
0 c c
-1 * * c c
-2 * * c
-3 *
A plot for the minimum of the above sum sequence, seen for c, could look like:
min_sum
0 c * *
-1 * c * *
-2 c c c
Clearly, for each occurrence of c, we are looking for the leftmost occurrence of c with sum_sequence lower than or equal to the current sum_sequence. A non-negative difference would mean c is a majority, and leftmost guarantees the interval is the longest up to our position. (We can extrapolate a maximal length that is bounded by characters other than c from the inner bounds of c as the former can be flexible without affecting the majority.)
Observe that from one occurrence of c to the next, its sum_sequence can decrease by an arbitrary size. However, it can only ever increase by 1 between two consecutive occurrences of c. Rather than each value of min_sum for c, we can record linear segments, marked by cs occurrences. A visual example:
[start_min
\
\
\
\
end_min, start_min
\
\
end_min]
We iterate over occurrences of c and maintain a pointer to the optimal segment of min_sum. Clearly we can derive the next sum_sequence value for c from the previous one since it is exactly diminished by the number of characters in between.
An increase in sum_sequence for c corresponds with a shift of 1 back or no change in the pointer to the optimal min_sum segment. If there is no change in the pointer, we hash the current sum_sequence value as a key to the current pointer value. There can be O(num_occurrences_of_c) such hash keys.
With an arbitrary decrease in c's sum_sequence value, either (1) sum_sequence is lower than the lowest min_sum segment recorded so we add a new, lower segment and update the pointer, or (2) we've seen this exact sum_sequence value before (since all increases are by 1 only) and can use our hash to retrieve the optimal min_sum segment in O(1).
As Matt Timmermans pointed out in the question comments, if we were just to continually update the pointer to the optimal min_sum by iterating over the list, we would still only perform O(1) amortized-time iterations per character occurrence. We see that for each increasing segment of sum_sequence, we can update the pointer in O(1). If we used binary search only for the descents, we would add at most (log k) iterations for every k occurences (this assumes we jump down all the way), which keeps our overall time at O(n).
Algorithm :
Essentially, what Boyer-Moore does is look for a suffix sufsuf of nums where suf[0]suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else. Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]
Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]
Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).
Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.
Here's Java Solution :
Time complexity : O(n)
Space complexity : O(1)
public int majorityElement(int[] nums) {
int count = 0;
Integer candidate = null;
for (int num : nums) {
if (count == 0) {
candidate = num;
}
count += (num == candidate) ? 1 : -1;
}
return candidate;
}

Accurate Big-O analysis

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3,
1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number
(1) is duplicated and another (3) is missing. This routine is often used in simulation of
algorithms. We assume the existence of a random number generator, RandInt(i,j), that
generates between i and j with equal probability. Here are three algorithms:
(i) Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random
numbers until you get one that is not already in A[0], A[1],…, A[i-1].
(ii) Same as algorithm (i), but keep an extra array called the Used array. When a random
number, Ran, is first put in the array A, set Used[Ran] = true. This means that
when filling A[i] with a random number, you can test in one step to see whether the
random number has been used, instead of the (possibly) i steps in the first algorithm.
(iii) Fill the array such that A[i] = i+1. Then
for (i=1; i<n; i++)
swap (A[i], A[RandInt(0,i)]);
Give as accurate (Big-O) an analysis as you can of the expected running time of each algorithm.
Anyone can help with this? Cause i just learn this chapter and not quite understand what the question wants..
I:
You have to fill N slots, but as you keep filling you are less likely to get a valid number.
If you have filled M slots, then the chance of getting a valid number is 1-(M/N). In addition, as the list gets longer, you need to iterate over the entire thing. N numbers * O(N) guesses per slot * O(N) checks to see if number is already contained = O(N^3)
(O(N) checks per number because worst case is last number, 1/N chance to get unused one)
II:Now we don't have to iterate over the entire array for each check, so only O(N^2)
III: Swapping takes constant time, and we iterate over the entire array once, so O(N)
I think I did all those correctly, but I easily could have missed something.
Hope this helps.
Option 4 : Fill a List / Collection with values from 1 to n, then shuffle the Collection. O(n) + ~O(n) => O(n)

How to find duplicates in the array for xor method? algorithm complexity O(n)

How to find duplicates in the array. In the case of the inverse problem when you have to find a unique element of all is clear you just xor all the elements and as a result we obtain a unique element.For example
int a[] = {2, 2, 3, 3, 4, 5, 5, 16, 16};
int res = a[0];
for(int i = 0; i < 9; ++i)
res ^= a[i];
for example given an array
int a[] = {2, 4, 7, 8, 4, 5};
Here a duplicate is 4, but it is not clear how to find a duplicate element of the array.
You are describing the Element Distinctness Problem.
Without extra space (and hashing) there is no O(n) solution to element distinctness problem, so you cannot modify the "xor algorithm for duplicate" for this problem.
The solutions for this problem are:
sort and iterate the sorted array to find dupes (easy in sorted array). This is O(nlogn) time.
Build a histogram of the data (hash-based) and iterate the histogram when done to verify if all elements have value of 1 in the histogram - O(n) average case, O(n) space.
We can find the duplicates in an array in 0(n) time by using the following algorithm.
traverse the list for i= 0 to n-1 elements
{
//check for sign of A[abs(A[i])] ;
if positive then
make it negative by A[abs(A[i])]=-A[abs(A[i])];
else // i.e., A[abs(A[i])] is negative
this element (ith element of list) is a repetition
}
Hope it helps!
One solution can be to build a Hashset. It goes as follows-
1. Initialize an empty hashset.
2. For each element in array,
a. Check if it is present in the hashset.
If yes, you found the duplicate
If not, add it to the hashset.
This way you can find all the duplicates in the array.
Space complexity: O(n) ; Time complexity: O(n)

How to determine to which extent/level an array of integers is already sorted

Consider an array of any given unique integers e.g. [1,3,2,4,6,5] how would one determine
the level of "sortedness", ranging from 0.0 to 1.0 ?
One way would be to evaluate the number of items that would have to be moved to make it sorted and then divide that by the total number of items.
As a first approach, I would detect the former as just the number of times a transition occurs from higher to lower value. In your list, that would be:
3 -> 2
6 -> 5
for a total of two movements. Dividing that by six elements gives you 33%.
In a way, this makes sense since you can simply move the 2 to between 1 and 3, and the 5 to between 4 and 6.
Now there may be edge cases where it's more efficient to move things differently but then you're likely going to have to write really complicated search algorithms to find the best solution.
Personally, I'd start with the simplest option that gave you what you wanted and only bother expanding if it turns out to be inadequate.
I would say the number of swaps is not a very good way to determine this. Most importantly because you can sort the array using a different number of swaps. In your case, you could switch 2<-->3 and 6<-->5, but you could also do a lot more switches.
How would you sort, say:
1 4 3 2 5
Would you directly switch 2 and 4, or would you switch 3 and 4, then 4 and 2, and then 3 and 2.
I would say a more correct method would be the number of elements in the right place divided by the total number of elements.
In your case, that would be 2/6.
Ok this is just an idea, but what if you can actually sort the array, i.e.
1,2,3,4,5,6
then get it as a string
123456
now get your original array in string
132465
and compare the Levenshtein distance between the two
I'll propose a different approach: let's count the number of non-descending sequences k in the array, then take its reversal: 1/k. For perfectly sorted array there's only one such sequence, 1/k = 1/1 = 1. This "unsortedness" level is the lowest when the array is sorted descendingly.
0 level is approached only asymptotically when the size of the array approaches infinity.
This simple approach can be computed in O(n) time.
In practice, one would measure unsortedness by the amount of work it needs to get sorted. That depends on what you consider "work". If only swaps are allowed, you could count the number op swaps needed. That has a nice upper bound of (n-1). For a mergesort kind of view you are mostly interested in the number of runs, since you'll need about log (nrun) merge steps. Statistically, you would probably take "sum(abs((rank - intended_rank))" as a measure, similar to a K-S test. But at eyesight, sequences like "HABCDEFG" (7 swaps, 2 runs, submean distance) and "HGFEDCBA" (4 swaps, 8 runs, maximal distance) are always showstoppers.
You could sum up the distances to their sorted position, for each item, and divide with the maximum such number.
public static <T extends Comparable<T>> double sortedMeasure(final T[] items) {
int n = items.length;
// Find the sorted positions
Integer[] sorted = new Integer[n];
for (int i = 0; i < n; i++) {
sorted[i] = i;
}
Arrays.sort(sorted, new Comparator<Integer>() {
public int compare(Integer i1, Integer i2) {
T o1 = items[i1];
T o2 = items[i2];
return o1.compareTo(o2);
}
public boolean equals(Object other) {
return this == other;
}
});
// Sum up the distances
int sum = 0;
for (int i = 0; i < n; i++) {
sum += Math.abs(sorted[i] - i);
}
// Calculate the maximum
int maximum = n*n/2;
// Return the ratio
return (double) sum / maximum;
}
Example:
sortedMeasure(new Integer[] {1, 2, 3, 4, 5}) // -> 0.000
sortedMeasure(new Integer[] {1, 5, 2, 4, 3}) // -> 0.500
sortedMeasure(new Integer[] {5, 1, 4, 2, 3}) // -> 0.833
sortedMeasure(new Integer[] {5, 4, 3, 2, 1}) // -> 1.000
One relevant measurement of sortedness would be "number of permutations needed to be sorted". In your case that would be 2, switching the 3,2 and 6,5. Then remains how to map this to [0,1]. You could calculate the maximum number of permutations needed for the length of the array, some sort of a "maximum unsortedness", which should yield a sortedness value of 0. Then take the number of permutations for the actual array, subtract it from the max and divide by max.

Resources