Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 months ago.
Improve this question
I'm struggling to find an algorithm for this problem that runs in O(n) worst case time.
Let's say you have an array, of n elements, with the value of the elements being either 1 or 0. I need to calculate the number of sub-arrays where there exists more 1's inside the sub-array than there are 0's outside the sub-array.
what i tried to do:
i tried to first make a prefix-sum array of the input array, this is so that when given a sub-array, we grab the start and end indexes of the sub array, look at the corresponding values in the prefix array -> we can find out the number of 1s in the sub-array in O(n) time.
for example:
input array = [1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1]
prefix sum array = [0, 1, 1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 6, 7]
example sub array = [1,0,0,1]
start index = 0, end index = 3
prefix[3+1] - prefix[0] = 2 - 0 = 2 (two 1s inside the sub
array)
I am not sure where to go from here. Help would be much appreciated.
The number of sub-arrays with more ones inside than zeroes outside is the i-th triangular number, where i is the number of ones in the entire input array. Here is a proof.
Let n be the number of elements in the input array, and i the number of ones in the input array.
For any sub-array A, let s(A) be its size (number of elements inside), and let v(A) be its value, that is the number of ones inside minus the number of zeroes outside. The sub-array counts if v(A) > 0.
The only sub-array of size n has value i (if s(A) = n, then v(A) = i), so if i > 0, that sub-array counts.
For any sub-array A and two constants t and w,
if s(A) = t => v(A) = w then s(A) = t-1 => v(B) = w-1.
In other words, if all sub-arrays of size t have value w, then all sub-arrays of size t-1 have value w-1. This is because any sub-array of size t-1 can be obtained by excluding an element from a sub-array of size t, and whatever that element is, its exclusion decreases the value of the sub-array by exactly 1: it's either 1 more zero outside, or 1 less one inside.
By induction, for any sub-array A, if s(A) = n-t then v(A) = i-t. Additionally, the converse is true: if v(A) = i-t then s(A) = n-t, because if s(A) != s(B) then v(A) != v(B).
For any sub-array A, A counts only if v(A) = i-t where t < i, so only if s(A) = n-t where t < i. There are n-(t-1) sub-arrays of size t (1 of size n, 2 of size n-1, ..., n of size 1), so there are n-(n-t-1) = t+1 sub-arrays of size n-t. And the sum of t+1 from t = 0 to t = i-1 equals i*(i+1)/2, also known as the i-th triangular number.
In conclusion, count the number of ones in the input array and compute the corresponding triangular number. In pseudo-code:
input_array = [1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1]
ones_count = 0
for i in input_array
if i == 1 then ones_count++
answer = ones_count * (ones_count+1) / 2
It is a very strange constraint, but after making the following observation it becomes easy:
Enlarging the sub-array always improves its value. We define the value v = i - o, whith i being the ones inside the sub-array and o the number of zeros outside the sub-array. Now another way to formulate the problem is: count all sub-arrays with a value greater than zero. Here is why the value always improves when enlarging the array:
adding a 0 to the sub-array: v = i - (o - 1) = i - o + 1 this is one greater compared to i - o.
adding a 1 to the sub-array: v = (i + 1) - o = i - o + 1 this is one greater compared to i - o.
So we can see that enlarging the sub-array not only increases its value, enlarging the sub-array by n elements increases the score exactly by n. Also decreasing the size of the sub-array by n decreases its value by n. This leads to the conclusion that we can calculate a minimum size for the sub-array and then use that number and the array size to get the number of sub-arrays fullfilling the condition.
So here is an O(n) algorithm:
Count the number of 0s in the array => z
if z = n return 0
The size making the value 0 is: m = z (by counting the zeros we look at it like having a sub-array of size 0, so the number of ones inside is 0 and the number of zeros outside is z, this gives a value v = 0 - z = -z, so if we increase the size by z we get a value of 0)
So finally we can start at index 0 with a sub-array, here we can count all sub-arrays of sizes m < size < n, so the number of sub-arrays for index 0 is n - m, at index 1 it is n - m - 1 and at index i it is n - m - i (as long as the therm is > 0). Because we know the term will reach zero because the minimum size for the sub-array is 1 we can get the following formula: count = (n - m) * (n - m + 1) / 2
And condensed: count the number of 0s z and return (n - z) * (n - z + 1) / 2
Which is the same as counting the number of 1s i and return i * (i + 1) / 2 (because n - z = i)
Related
Before I start: I hope this question is not a duplicate. I've found a couple of similar ones, but none of them seems to describe exactly the same problem. But if it is a duplicate, I will be glad to see a solution (even if it is different from my algorithm.)
I have been trying to answer this question. After a couple of attempts I managed to implement an algorithm that seems to be correct (in C). I've prepared a couple of tests and they all pass.
Now, initially I thought that the task would be easier. Therefore, I would be certain of my solution, and would publish it right after I would see it works. But I'd rather not publish an answer that presents a solution that only seems to be correct. So, I wrote a "proof of correctness", or at least something that looks like that. (I don't remember if I have ever written any proof of correctness for a program, so I'm rather certain its quality can be improved.)
So, I have two questions:
Is the algorithm that I wrote correct?
Is the "proof" that I wrote correct?
Also, I'd love to know if you have any tips on how to improve both the algorithm and the "proof" beside correctness, and maybe even the implementation (though I know C, I can always make a mistake). If either the algorithm formulations, the proof, or the C code seems too complicated to read or check, please give me some tips, and I'll try to simplify them.
And please, don't hesitate to point out that I misunderstood the problem completely if that is the case. After all, it is most important to present the right solution for the author of the original question.
I'm going to wait some time for an answer here before I publish an answer to the original question. But eventually, if there won't be any, I think I will publish it anyway.
The problem
To quote the author of the original question:
Suppose I have an array, arr = [2, 3, 5, 9] and k = 2. I am supposed to find subsequences of length k such that no two elements in each subsequence are adjacent. Then find the maximums of those sequences. Finally, find the minimum of the maximums. For example, for arr, the valid subsequences are [2,5], [3,9], [2,9] with maximums 5, 9, and 9 respectively. The expected output would be the minimum of the maximums, which is 5.
My algorithm
I've made two assumptions not stated in the original question:
the elements of the input sequence are unique,
for the input subsequence length and k, 2 <= k <= (length(S) + 1) / 2.
They may look a bit arbitrary, but I think that they simplify the problem a bit. When it comes to the uniqueness, I think I could remove this assumption (so that the algorithm will suit for more use cases). But before, I need to know whether the current solution is correct.
Pseudocode, version 1
find_k_length_sequence_maxes_min (S, k)
if k < 2 or length(S) < 2 * k - 1
return NO_SUCH_MINIMUM
sorted = copy(S)
sort_ascending(sorted)
for t from 1 to length(S)
current_length = 0
index = find_index(S, sorted[t])
last_index = index
for u descending from index to 1
if u < last_index - 1 && S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
last_index = index
for u ascending from index to length(S)
if u > last_index + 1 and S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
Pseudocode, version 2
(This is the same algorithm as in version 1, only written using more natural language.)
(1) Let S be a sequence of integers such that all of its elements are unique.
(2) Let a "non-contiguous subsequence of S" mean such a subsequence of S that any two elements of it are non-adjacent in S.
(3) Let k be an integer such that 2 <= k <= (length(S) + 1) / 2.
(4) Find the minimum of maximums of all the non-contiguous subsequences of S of length k.
(4.1) Find the minimal element of S such that it is the maximum of a non-contiguous subsequence of S of size k.
(4.1.1) Let sorted be a permutation of S such that its elements are sorted in ascending order.
(4.1.2) For every element e of sorted, check whether it is a maximum of a non-contiguous subsequence of S of length k. If it is, return it.
(4.1.2.1) Let x and y be integers such that 1 <= x <= index(minmax(k)) and index(minmax(k)) <= y <= length(S).
(4.1.2.2) Let all(x, y) be the set of all the non-contiguous subsequences of S between S[x] (including) and S[y] (including) such that e is the maximum of each of them.
(4.1.2.3) Check whether the length of the longest sequence of all(1, index(e)) is greater than or equal to k. If it is, return e.
(4.1.2.4) Check whether the sum of the lengths of the longest subsequence of all(1, index(e)) and the length of the longest subsequence of all(index(e), length(S)) is greater than or equal to k. If it is, return e.
Proof of correctness
(1) Glossary:
by "observation" I mean a statement not derived from any observation or conclusion, not demanding a proof,
by "conclusion" I mean a statement derived from at least one observation or conclusion, not demanding a proof,
by "theorem" I mean a statement not derived from any observation or conclusion, demanding a proof.
(2) Let S be a sequence of integers such that all of its elements are unique.
(3) Let a "non-contiguous subsequence of S" mean such a subsequence of S that any two elements of it are non-adjacent in S.
(4) Let k be an integer such that 2 <= k <= (length(S) + 1) / 2.
(5) Let minmax(k) be an element of S such that it is the minimum of maximums of all the non-contiguous subsequences of S of length k.
(6) (Theorem) minmax(k) is a minimal element of S such that it is a maximum of a non-contiguous subsequence of S of length k.
(7) In other words, there is no element in S less than minmax(k) that is a maximum of a non-contiguous subsequence of S of length k.
(8) (Proof of (6)) (Observation) Since minmax(k) is the minimum of maximums of all the non-contiguous sequences of S of length k, there is no non-contiguous subsequence of S of length k such that its maximum is greater than minmax(k).
(9) (Proof of (6)) (Conclusion) If (6), then any element of S less than minmax(k) cannot be a maximum of any non-contiguous subsequence of S of length k.
(10) (Proof of (6)) QED
(11) Let x and y be integers such that 1 <= x <= index(minmax(k)) and index(minmax(k)) <= y <= length(S).
(12) Let all(x, y) be the set of all the non-contiguous subsequences of S between S[x] (including) and S[y] (including) such that minmax(k) is the maximum of each of them.
(13) (Observation) minmax(k) is the maximum of the longest sequence of all(1, length(S)).
(14) This observation may seem too trivial to note. But, apparently it was easier for me to write the algorithm, and prove it, with the longest subsequence in mind, instead of a subsequence of length k. Therefore I think this observation is worth noting.
(15) (Theorem) One can produce the longest sequence of all(1, index(minmax(k))) by:
starting from minmax(k),
moving to S[1],
taking always the next element that is both less than or equal to minmax(k), and non-adjacent to the last taken one.
(16) (Proof of (15)) Let a "possible element" of S mean an element that is both less than or equal to minmax(k), and non-adjacent to the last taken one.
(16a) (Proof of (15)) Let C be the subsequence produced in (15).
(17) (Proof of (15)) (Observation)
Before the first taken element, there is exactly 0 possible elements,
between any two taken elements (excluding them), there is exactly 0 or 1 possible elements,
after the last taken element, there is exactly 0 or 1 possible elements.
(18) (Proof of (15)) Let D be a sequence of all(1, index(minmax(k))) such that length(D) > length(C).
(19) (Proof of (15)) At least one of the following conditions is fulfilled:
before the first taken element, there is less than 0 possible elements in D,
between two taken elements (excluding them) such that there is 1 possible elements between them in C, there is 0 possible elements in D,
after the last taken element, there is less than 1 possible element in D.
(20) (Proof of (15)) (Observation)
There cannot be less than 0 possible elements before the first taken element,
if there is less than 1 possible elements between two taken elements (excluding them) in D, where in C there is 1, it means that we have taken either an element greater than minmax(k), or an element adjacent to the last taken one, which contradicts (12),
if there is less than 1 possible element between the last taken element in D, where in C there is 1, it means that we have taken either an element greater than minmax(k), or an element adjacent to the last taken one, which contradicts (12).
(21) (Proof of (15)) QED
(22) (Observation) (15) applies also to all(index(minmax(k)), length(S)).
(23) (Observation) length(all(1, length(S))) = length(all(1, index(minmax(k)))) + length(all(index(minmax(k)), length(S))).
Implementation
All the tests pass if any of the assert calls does not abort the program.
#include <limits.h> // For INT_MAX
#include <assert.h> // For assert
#include <string.h> // For memcpy
#include <stdlib.h> // For qsort
int compar (const void * first, const void * second) {
if (* (int *)first < * (int *)second) return -1;
else if (* (int *)first == * (int *)second) return 0;
else return 1;
}
void find_k_size_sequence_maxes_min (int array_length, int array[], int k, int * result_min) {
if (k < 2 || array_length < 2 * k - 1) return;
int sorted[array_length];
memcpy(sorted, array, sizeof (int) * array_length);
qsort(sorted, array_length, sizeof (int), compar);
for (int t = 0; t < array_length; ++t) {
int index = -1;
while (array[++index] != sorted[t]);
int size = 1;
int last_index = index;
for (int u = index; u >= 0; --u) {
if (u < last_index - 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
last_index = index;
for (int u = index; u < array_length; ++u) {
if (u > last_index + 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
}
}
int main (void) {
// Test case 1
int array1[] = { 6, 3, 5, 8, 1, 0, 9, 7, 4, 2, };
int array1_length = (int)((double)sizeof array1 / sizeof (int));
int k = 2;
int min = INT_MAX;
find_k_size_sequence_maxes_min(array1_length, array1, k, & min);
assert(min == 2);
// Test case 2
int array2[] = { 1, 7, 2, 3, 9, 11, 8, 14, };
int array2_length = (int)((double)sizeof array2 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 2);
// Test case 3
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 8);
// Test case 4
k = 4;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 9);
// Test case 5
int array3[] = { 3, 5, 4, 0, 8, 2, };
int array3_length = (int)((double)sizeof array3 / sizeof (int));
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array3_length, array3, k, & min);
assert(min == 3);
// Test case 6
int array4[] = { 18, 21, 20, 6 };
int array4_length = (int)((double)sizeof array4 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array4_length, array4, k, & min);
assert(min == 18);
// Test case 7
int array5_length = 1000000;
int array5[array5_length];
for (int m = array5_length - 1; m >= 0; --m) array5[m] = m;
k = 100;
min = INT_MAX;
find_k_size_sequence_maxes_min(array5_length, array5, k, & min);
assert(min == 198);
}
Edit: Thanks to #user3386109, the number of iterations on sorted may be reduced in some cases. There need to be at least k - 1 elements less than sorted[t] to form a subarray of size k or greater together with sorted[t]. Therefore, in the for loop, it should be int t = k - 1 instead of int t = 0.
Edit: Now that it passed a week, I published my solution as an answer in the original question: Minimum of maximums for k-size nonconsecutive subsequence of array If you will happen to have any further tips on how to improve it, you can share them either here, or in the original question (as comments to my answer).
I have an array [a0,a1,...., an] I want to calculate the sum of the distance between every pair of the same element.
1)First element of array will always be zero.
2)Second element of array will be greater than zero.
3) No two consecutive elements can be same.
4) Size of array can be upto 10^5+1 and elements of array can be from 0 to 10^7
For example, if array is [0,2,5 ,0,5,7,0] then distance between first 0 and second 0 is 2*. distance between first 0 and third 0 is 5* and distance between second 0 and third 0 is 2*. distance between first 5 and second 5 is 1*. Hence sum of distances between same element is 2* + 5* + 2* + 1* = 10;
For this I tried to build a formula:- for every element having occurence more than 1 (0 based indexing and first element is always zero)--> sum = sum + (lastIndex - firstIndex - 1) * (NumberOfOccurence - 1)
if occurence of element is odd subtract -1 from sum else leave as it is. But this approach is not working in every case.
,,But this approach works if array is [0,5,7,0] or if array is [0,2,5,0,5,7,0,1,2,3,0]
Can you suggest another efficient approach or formula?
Edit :- This problem is not a part of any coding contest, it's just a little part of a bigger problem
My method requires space that scales with the number of possible values for elements, but has O(n) time complexity.
I've made no effort to check that the sum doesn't overflow an unsigned long, I just assume that it won't. Same for checking that any input values are in fact no more than max_val. These are details that would have to be addressed.
For each possible value, it keeps track of how much would be added to the sum if one of that element is encountered in total_distance. In instances_so_far, it keeps track of how many instances of a value have already been seen. This is how much would be added to total_distance each step. To make this more efficient, the last index at which a value was encountered is tracked, such that total_distance need only be added to when that particular value is encountered, instead of having nested loops that add every value at every step.
#include <stdio.h>
#include <stddef.h>
// const size_t max_val = 15;
const size_t max_val = 10000000;
unsigned long instances_so_far[max_val + 1] = {0};
unsigned long total_distance[max_val + 1] = {0};
unsigned long last_index_encountered[max_val + 1];
// void print_array(unsigned long *array, size_t len) {
// printf("{");
// for (size_t i = 0; i < len; ++i) {
// printf("%lu,", array[i]);
// }
// printf("}\n");
// }
unsigned long get_sum(unsigned long *array, size_t len) {
unsigned long sum = 0;
for (size_t i = 0; i < len; ++i) {
if (instances_so_far[array[i]] >= 1) {
total_distance[array[i]] += (i - last_index_encountered[array[i]]) * instances_so_far[array[i]] - 1;
}
sum += total_distance[array[i]];
instances_so_far[array[i]] += 1;
last_index_encountered[array[i]] = i;
// printf("inst ");
// print_array(instances_so_far, max_val + 1);
// printf("totd ");
// print_array(total_distance, max_val + 1);
// printf("encn ");
// print_array(last_index_encountered, max_val + 1);
// printf("sums %lu\n", sum);
// printf("\n");
}
return sum;
}
unsigned long test[] = {0,1,0,2,0,3,0,4,5,6,7,8,9,10,0};
int main(void) {
printf("%lu\n", get_sum(test, sizeof(test) / sizeof(test[0])));
return 0;
}
I've tested it with a few of the examples here, and gotten the answers I expected.
I had to use static storage for the arrays because they overflowed the stack if put there.
I've left in the commented-out code I used for debugging, it's helpful to understand what's going on, if you reduce max_val to a smaller number.
Please let me know if you find a counter-example that fails.
Here is Python 3 code for your problem. This works on all the examples given in your question and in the comments--I included the test code.
This works by looking at how each consecutive pair of repeated elements adds to the overall sum of distances. If the list has 6 elements, the pair distances are:
x x x x x x The repeated element's locations in the array
-- First, consecutive pairs
--
--
--
--
----- Now, pairs that have one element inside
-----
-----
-----
-------- Now, pairs that have two elements inside
--------
--------
----------- Now, pairs that have three elements inside
-----------
-------------- Now, pairs that have four elements inside
If we look down between each consecutive pair, we see that it adds to the overall sum of all pairs:
5 8 9 8 5
And if we look at the differences between those values we get
3 1 -1 -3
Now if we use my preferred definition of "distance" for a pairs, namely the difference of their indices, we can use those multiplicities for consecutive pairs to calculate the overall sum of distances for all pairs. But since your definition is not mine, we calculate the sum for my definition then adjust it for your definition.
This code makes one pass through the original array to get the occurrences for each element value in the array, then another pass through those distinct element values. (I used the pairwise routine to avoid another pass through the array.) That makes my algorithm O(n) in time complexity, where n is the length of the array. This is much better than the naive O(n^2). Since my code builds an array of the repeated elements, once per unique element value, this has space complexity of at worst O(n).
import collections
import itertools
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ..."""
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
def sum_distances_of_pairs(alist):
# Make a dictionary giving the indices for each element of the list.
element_ndxs = collections.defaultdict(list)
for ndx, element in enumerate(alist):
element_ndxs[element].append(ndx)
# Sum the distances of pairs for each element, using my def of distance
sum_of_all_pair_distances = 0
for element, ndx_list in element_ndxs.items():
# Filter out elements not occurring more than once and count the rest
if len(ndx_list) < 2:
continue
# Sum the distances of pairs for this element, using my def of distance
sum_of_pair_distances = 0
multiplicity = len(ndx_list) - 1
delta_multiplicity = multiplicity - 2
for ndx1, ndx2 in pairwise(ndx_list):
# Update the contribution of this consecutive pair to the sum
sum_of_pair_distances += multiplicity * (ndx2 - ndx1)
# Prepare for the next consecutive pair
multiplicity += delta_multiplicity
delta_multiplicity -= 2
# Adjust that sum of distances for the desired definition of distance
cnt_all_pairs = len(ndx_list) * (len(ndx_list) - 1) // 2
sum_of_pair_distances -= cnt_all_pairs
# Add that sum for this element into the overall sum
sum_of_all_pair_distances += sum_of_pair_distances
return sum_of_all_pair_distances
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0]) == 10
assert sum_distances_of_pairs([0, 5, 7, 0]) == 2
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0, 1, 2, 3, 0]) == 34
assert sum_distances_of_pairs([0, 0, 0, 0, 1, 2, 0]) == 18
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 0, 10, 0]) == 66
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 0, 4, 5, 6, 7, 8, 9, 10, 0]) == 54
I am solving this question which requires some optimized techniques to
solve it. I can think of the brute force method only which requires
combinatorics.
Given an array A consisting of n integers. We call an integer "good"
if it lies in the range [L,R] (i.e. L≤x≤R). We need to make sure if we
pick up any K integers from the array at least one of them should be a
good integer.
For achieving this, in a single operation, we are allowed to
increase/decrease any element of the array by one.
What will be the minimum number of operations we will need for a
fixed k?"
i.e k=1 to n.
input:
L R
1 2
A=[ 1 3 3 ]
output:
for k=1 : 2
for k=2 : 1
for k=3 : 0
For k=1, you have to convert both the 3s into 2s to make sure that if
you select any one of the 3 integers, the selected integer is good.
For k=2, one of the possible ways is to convert one of the 3s into 2.
For k=3, no operation is needed as 1 is a good integer.
As burnpanck has explained in his answer, to make sure that when you pick any k elements in the array, and at least one of them is in range [L,R], we need to make sure that there are at least n - k + 1 numbers in range [L,R] in the array.
So, first , for each element, we calculate the cost to make this element be a valid element (which is in range [L,R]) and store those cost in an array cost.
We notice that:
For k = 1, the minimum cost is the sum of array cost.
For k = 2, the minimum cost is the sum of cost, minus the largest element.
For k = 3, the minimum cost is the sum of cost, minus two largest elements.
...
So, we need to have a prefixSum array, which ith position is the sum of sorted cost array from 0 to ith.
After calculate prefixSum, we can answer result for each k in O(1)
So here is the algo in Java, notice the time complexity is O(n logn):
int[]cost = new int[n];
for(int i = 0; i < n; i++)
cost[i] = //Calculate min cost for element i
Arrays.sort(cost);
int[]prefix = new int[n];
for(int i = 0; i < n; i++)
prefix[i] = cost[i] + (i > 0 ? prefix[i - 1] : 0);
for(int i = n - 1; i >= 0; i--)
System.out.println("Result for k = " + (n - i) + " is " + prefix[i]);
To be sure that from picking k elements will give at least one valid means you should have not more than k-1 invalid in your set. You therefore need to find the shortest way to make enough elements valid. This I would do as follows: In a single pass, generate a map that counts how many elements are in the set that need $n$ operations to be made valid. Then, you clearly want to take those elements that need the least operations, so take the required number of elements in ascending order of required number of operations, and sum the number of operations.
In python:
def min_ops(L,R,A_set):
n_ops = dict() # create an empty mapping
for a in A_set: # loop over all a in the set A_set
n = max(0,max(a-R,L-a)) # the number of operations requied to make a valid
n_ops[n] = n_ops.get(n,0) + 1 # in the mapping, increment the element keyed by *n* by ones. If it does not exist yet, assume it was 0.
allret = [] # create a new list to hold the result for all k
for k in range(1,len(A_set)+1): # iterate over all k in the range [1,N+1) == [1,N]
n_good_required = len(A_set) - k + 1
ret = 0
# iterator over all pairs of keys,values from the mapping, sorted by key.
# The key is the number of ops required, the value the number of elements available
for n,nel in sorted(n_ops.items()):
if n_good_required:
return ret
ret += n * min(nel,n_good_required)
n_good_required -= nel
allret.append(ret) # append the answer for this k to the result list
return allret
As an example:
A_set = [1,3,3,6,8,5,4,7]
L,R = 4,6
For each A, we find how many operations we need to make it valid:
n = [3,1,1,0,2,0,0,1]
(i.e. 1 needs 3 steps, 3 needs one, and so on)
Then we count them:
n_ops = {
0: 3, # we already have three valid elements
1: 3, # three elements that require one op
2: 1,
3: 1, # and finally one that requires 3 ops
}
Now, for each k, we find out how many valid elements we need in the set,
e.g. for k = 4, we need at most 3 invalid in the set of 8, so we need 5 valid ones.
Thus:
ret = 0
n_good_requied = 5
with n=0, we have 3 so take all of them
ret = 0
n_good_required = 2
with n=1, we have 3, but we need just two, so take those
ret = 2
we're finished
For array [4,3,5,1,2],
we call prefix of 4 is NULL, prefix-less of 4 is 0;
prefix of 3 is [4], prefix-less of 3 is 0, because none in prefix is less than 3;
prefix of 5 is [4,3], prefix-less of 5 is 2, because 4 and 3 are both less than 5;
prefix of 1 is [4,3,5], prefix-less of 1 is 0, because none in prefix is less than 1;
prefix of 2 is [4,3,5,1], prefix-less of 2 is 1, because only 1 is less than 2
So for array [4, 3, 5, 1, 2], we get prefix-less arrary of [0,0, 2,0,1],
Can we get an O(n) algorithm to get prefix-less array?
It can't be done in O(n) for the same reasons a comparison sort requires O(n log n) comparisons. The number of possible prefix-less arrays is n! so you need at least log2(n!) bits of information to identify the correct prefix-less array. log2(n!) is O(n log n), by Stirling's approximation.
Assuming that the input elements are always fixed-width integers you can use a technique based on radix sort to achieve linear time:
L is the input array
X is the list of indexes of L in focus for current pass
n is the bit we are currently working on
Count is the number of 0 bits at bit n left of current location
Y is the list of indexs of a subsequence of L for recursion
P is a zero initialized array that is the output (the prefixless array)
In pseudo-code...
Def PrefixLess(L, X, n)
if (n == 0)
return;
// setup prefix less for bit n
Count = 0
For I in 1 to |X|
P(I) += Count
If (L(X(I))[n] == 0)
Count++;
// go through subsequence with bit n-1 with bit(n) = 1
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 1)
Y.append(X(I))
PrefixLess(L, Y, n-1)
// go through subsequence on bit n-1 where bit(n) = 0
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 0)
Y.append(X(I))
PrefixLess(L, Y, n-1)
return P
and then execute:
PrefixLess(L, 1..|L|, 32)
I think this should work, but double check the details. Let's call an element in the original array a[i] and one in the prefix array as p[i] where i is the ith element of the respective arrays.
So, say we are at a[i] and we have already computed the value of p[i]. There are three possible cases. If a[i] == a[i+1], then p[i] == p[i+1]. If a[i] < a[i+1], then p[i+1] >= p[i] + 1. This leaves us with the case where a[i] > a[i+1]. In this situation we know that p[i+1] >= p[i].
In the naïve case, we go back through the prefix and start counting items less than a[i]. However, we can do better than that. First, recognize that the minimum value for p[i] is 0 and the maximum is i. Next look at the case of an index j, where i > j. If a[i] >= a[j], then p[i] >= p[j]. If a[i] < a[j], then p[i] <= p[j] + j . So, we can start going backwards through p updating the values for p[i]_min and p[i]_max. If p[i]_min equals p[i]_max, then we have our solution.
Doing a back of the envelope analysis of the algorithm, it has O(n) best case performance. This is the case where the list is already sorted. The worst case is where it is reversed sorted. Then the performance is O(n^2). The average performance is going to be O(k*n) where k is how much one needs to backtrack. My guess is for randomly distributed integers, k will be small.
I am also pretty sure there would be ways to optimize this algorithm for cases of partially sorted data. I would look at Timsort for some inspiration on how to do this. It uses run detection to detect partially sorted data. So the basic idea for the algorithm would be to go through the list once and look for runs of data. For ascending runs of data you are going to have the case where p[i+1] = p[i]+1. For descending runs, p[i] = p_run[0] where p_run is the first element in the run.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Zero sum SubArray
An array contains both positive and negative elements, find the
subarray whose sum equals 0.
This is an interview question.
Unfortunately, I cannot read the accepted answer to this question, so I am asking it again: how to find the minimal integer subarray with zero sum?
Note, this is not a "zero subset problem". The obvious brute-force solution is O(N^2) (loop over all subarrays). Can we solve it in O(N)?
This algorithm will find them all, you can easily modify it to find the minimal subarray.
Given an int[] input array, you can create an int[] tmp array where tmp[i] = tmp[i - 1] + input[i]; so that at each element of tmp will store the sum of the input up to that element.
Now if you check tmp, you'll notice that there might be values that are equal to each other. Let's say that this values are at indexes j an k with j < k, then the subarray with sum 0 will be from index j + 1 to k. NOTE: if j + 1 == k, then k is 0 and that's it! ;)
NOTE: The algorithm should consider a virtual tmp[-1] = 0;
The implementation can be done in different ways including using a HashMap as suggested by BrokenGlass but be careful with the special case in the NOTE above.
Example:
int[] input = {4, 6, 3, -9, -5, 1, 3, 0, 2}
int[] tmp = {4, 10, 13, 4, -1, 0, 3, 3, 5}
Note the value 4 in tmp at index 0 and 3 ==> sum tmp 1 to 3 = 0, length (3 - 1) + 1 = 4
Note the value 0 in tmp at index 5 ==> sum tmp 0 to 5 = 0, length (5 - 0) + 1 = 6
Note the value 3 in tmp at index 6 and 7 ==> sum tmp 7 to 7 = 0, length (7 - 7) + 1 = 1
An array contains both positive and negative elements, find the
subarray whose sum equals 0.
Yes that can be done in O(n). If the sum of the elements within a subarray equals zero that means the sum of elements up to the first element before the sub array is the same as the sum of elements up to the last element in the subarray.
Go through the array and for each element K put the sum up to K and the index K in a hashtable, if the sum up to the current element exists already check the index of that element and the current element, if the delta is lower than the minimum subarray length, update the minimum. Update the hashtable with (sum, current index K).