Split arrays of natural numbers according to a requirement - arrays

I have two arrays {Ai} and {Bi} of natural numbers. The sums of all elements are equal.
I need to split each element of the two arrays into three natural numbers:
Ai = A1i + A2i + A3i
Bi = B1i + B2i + B3i
such that the sum of all elements of A1 is equal to the sum of all elements of B1 and the same for all the other pairs.
The important part I initially forgot about:
Each element from A1j, A2j, A3j should be between Aj/3-2 and Aj/3+2 or at least equal to one of these numbers
Each element from B1j, B2j, B3j should be between Bj/3-2 and Bj/3+2 or at least equal to one of these numbers
So the elements of arrays must be split in almost equal parts
I look for some more elegant solution than just calculating all possible variant for both arrays.

I look for some more elegant solution than just calculating all possible variant for both arrays.
It should be possible to divide them so that the sums of A1, A2 and A3 are near to a third of A, and the same for B. It would be easy to just make all values an exact third, but that’s not possible with natural numbers. So we have to floor the results (trivial) and distribute the remainders uniformly over the three arrays (manageable).
I don't know whether it’s the only solution, but it works in O(n) and my intuition says it will hold your invariants (though I didn’t proof it):
n = 3
for j=0 to n
A[j] = {}
x = 0 // rotating pointer for the next subarray
for i in A
part = floor(A[i] / n)
rest = A[i] % n
for j=0 to n
A[j][i] = part
// distribute the rest over the arrays, and rotate the pointer
for j=0 to rest
A[x][i]++
x++
/* Do the same for B */
One could also formulate the loop without the division, only distributing the single units (1) of an A[i] over the A[x][i]s:
n = 3
for j=0 to n
A[j] = {}
for k=0 to |A|
A[j][i] = 0
x = 0 // rotating pointer for the next subarray
for i in A
// distribute the rest over the arrays, and rotate the pointer
for j=0 to A[i]
A[x][i]++
x++

You should look up the principle of dynamic programming.
In this case, it seems to be similar to some coin change problems.
As for finding A1_i, A2_i, A3_i you should do it recursively:
def find_numbers(n, a, arr):
if arr[n] not empty:
return
if n == 0:
arr[n].append(a)
return
if a.size() > 2:
return
t = n
for each element of a:
t -= element
for i = 0 to :
find_numbers(n, append(a, i), arr)
We use arr so that we do not need to compute for each number multiple times the possible combinations. If you look at the call tree after a time this function will return the combinations from the arr, and not compute them again.
In your main call:
arr = []
for each n in A:
find_number(n, [], arr)
for each n in B:
find_number(n, [], arr)
Now you have all the combinations for each n in arr[n].
I know it is a subpart of the problem, but finding the right combinations for each A_i, B_i from arr is something really similar to this. > It is very important to read the links I gave you so that you understand the underlying theory behind.

I add the stipulation that A1, A2, and A3 must be calculated from A without knowledge of B, and, similarly, B1, B2, and B3 must be calculated without knowledge of A.
The requirement that each A1i, A2i, A3i must be in [Ai/3–2, Ai/3+2] implies that the sums of the elements of A1, A2, and A3 must each be roughly one-third that of A. The stipulation compels us to define this completely.
We will construct the arrays in any serial order (e.g., from element 0 to the last element). As we do so, we will ensure the arrays remain nearly balanced.
Let x be the next element of A to be processed. Let a be round(x/3). To account for x, we must append a total of 3•a+r to the arrays A1, A2, and A3, where r is –1, 0, or +1.
Let d be sum(A1) – sum(A)/3, where the sums are of the elements processed so far. Initially, d is zero, since no elements have been processed. By design, we will ensure d is –2/3, 0, or +2/3 at each step.
Append three values as shown below to A1, A2, and A3, respectively:
If r is –1 and d is –2/3, append a+1, a–1, a–1. This changes d to +2/3.
If r is –1 and d is 0, append a–1, a, a. This changes d to –2/3.
If r is –1 and d is +2/3, append a–1, a, a. This changes d to 0.
If r is 0, append a, a, a. This leaves d unchanged.
If r is +1 and d is –2/3, append a+1, a, a. This changes d to 0.
If r is +1 and d is 0, append a+1, a, a. This changes d to +2/3.
If r is +1 and d is +2/3, append a–1, a+1, a+1. This changes d to –2/3.
At the end, the sums of A1, A2, and A3 are uniquely determined by the sum of A modulo three. The sum of A1 is (sum(A3)–2)/3, sum(A3)/3, or (sum(A3)+2)/3 according to whether the sum of A modulo three is congruent to –1, 0, or +1, respectively.
Completing the demonstration:
In any case, a–1, a, or a+1 is appended to an array. a is round(x/3), so it differs from x/3 by less than 1, so a–1, a, and a+1 each differ from x/3 by less than 2, satisfying the constraint that the values must be in [Ai/3–2, Ai/3+2].
When B1, B2, and B3 are prepared in the same way as shown above for A1, A2, and A3, their sums are determined by the sum of B3. Since the sum of A equals the sum of B, the sums of A1, A2, and A3 equal the sums of B1, B2, and B3, respectively.

Related

Finding all combinations of elements from two sets such that their geometric mean falls into third set

I have a integers from 1 to n. I randomly allot every integer into one of three sets A, B and C (A ∩ B = B ∩ C = C ∩ A = Ø). Every integer does belong to one set. So I need to calculate all combination of elements (a,b) such that a ∈ A, b ∈ B, and the geometric mean of a,b belongs to C. Basically sqrt(a*b) ∈ C.
My solution is to first mark on an array of size n whether every element went into set A,B or C. Then I loop through the array for all elements that belong to A. When I encounter one, I again loop through for all elements that belong to B. If array[sqrt(a*b)] == C, then I add (a, b, sqrt(a,b)) as one possible combination. Then I do the same for the entire array, which is O(n^2).
Is there a more optimal solution possible?
It can be done with better complexity than O(n^2). The solution sketched here is in O(n * sqrt(n) * log(n)).
The main idea is the following:
let (a, b, c) be a good solution, i.e. one with sqrt(a * b) = c. We can write a as a = s * t^2, where s is the product of the prime numbers that have odd exponents in a's prime factorization. It's guaranteed that the remaining part of a is a perfect square. Since a * b is a perfect square, then b must be of the form s * k^2. For each a (there are O(n) such numbers), after finding s from the decomposition above (this can be done in O(log(n)), as it will be described next), we can restrict our search for the number b to those of the form b = s * k^2, but there are only O(sqrt(n)) numbers like this smaller than n. For each pair a, b enumerated like this we can test in O(1) whether there is a good c, using the representation you used in the question.
One critical part in the idea above is decomposing a into s * t^2, i.e. finding the primes that have odd power in a's factorization.
This can be done using a pre-processing step, that finds the prime factors (but not also their powers) of every number in {1, 2, .. n}, using a slightly modified sieve of Eratosthenes. This modified version would not only mark a number as "not prime" when iterating over the multiples of a prime, but would also append the current prime number to the list of the factors of the current multiple. The time complexity of this pre-processing step is n * sum{for each prime p < n}(1/p) = n * log(log(n)) -- see this for details.
Using the result of the pre-processing, which is the list of primes which divide a, we can find those primes with odd power in O(log(n)). This is achieved by dividing a by each prime in the list until it is no more divisible by that prime. If we made an odd number of divisions, then we use the current prime in s. After all divisions are done, the result will be equal to 1. The complexity of this is O(log(n)) because in the worst case we always divide the initial number by 2 (the smallest prime number), thus it will take at most log2(a) steps to reach value 1.
The complexity of the main step dominates the complexity of the preprocessing, thus the overall complexity of this approach is O(n * sqrt(n) * log(n)).
Remark: in the decomposition a = s * t^2, s is the product of the prime numbers in a with odd exponents, but their exponent is not used in s (i.e. s is just the product of those primes, with exponent 1). Only in this situation it is guaranteed that b should be of the form s * k^2. Indeed, since a * b = c * c, the prime factorization of the right hand side uses only even exponents, thus all primes from s should also appear in b with odd exponents, and all other primes from b's factorization should have even exponents.
Expanding on the following line: "we can restrict our search for the number b to those of the form b = s * k^2, but there are only O(sqrt(n)) numbers like this smaller than n".
Let's consider an example. Imagine that we have something like n = 10,000 and we are currently looking for solutions having a = 360 = 2^3 * 3^2 * 5. The primes with odd exponent in a's factorization are 2 and 5 (thus s = 2 * 5; a = 10 * 6^2).
Since a * b is a perfect square, it means that all primes in the prime factorization of a * b have even exponents . This implies that those two primes (2 and 5) need to also appear in b's factorization with odd exponents, and the rest of the exponents in b's prime factorization need to be even. Thus b is of the form s * k^2 = 10 * k ^ 2.
So we proved that b = 10 * k ^ 2. This is helpful, because we can now enumerate all the b values of this form quickly (in O(sqrt(n)). We only need to consider k = 1, k = 2, ..., k = (int)sqrt(n / 10). Larger values of k result in values of b larger than n. Each of these k values determines one b value, which we need to verify. Note that when verifying one of these b values, it should be first checked whether it indeed is in set B, which can be done in O(1), and whether sqrt(a * b) is in the set C, which can also be done in O(1).

Algorithm for finding n-2 permutation (with repeats) given lexicographic index

I am trying to find a method to determine a permutation of n-2 numbers, repetitions being allowed) from a set of n numbers, given its lexicographic index. One reason why we are doing this is to find Prufer codes, given an index.
Considering a set of numbers as [1,2,3,4] we shall get a set of n-2 permutations as [1,1], [1,2], [1,3], [1,4]......[4,3], [4,4].
My question is if there is a methodology for getting a permutation like this given the index as an input, without enumerating all the permutations. I looked at the methods in this link Finding the index of a given permutation but this may have issues with permutations of n-2 objects. Thanking you.
The permutation with a given rank from a set of n numbers can be calculated by converting the rank to a base-n number, and interpreting its digits as 0-based indexes:
set: [1,2,3,4]
0-based rank: 9
9 in base-4: 21
0-based indexes: [2,1]
permutation: [3,2]
set: [a,b,c,d,e]
0-based rank: 64
64 in base-5: 224
0-based indexes: [2,2,4]
permutation: [c,c,e]
Something like this should do the trick, where set is an array of int containing n numbers, and perm is an array of int large enough to hold n-2 numbers:
void permutation(int *set, int n, int rank, int *perm) {
for (int k = n - 2; k > 0; --k) {
perm[k - 1] = set[rank % n];
rank /= n;
}
}
(The above code assumes valid input)
This is not a programming problem but more about understanding. A Prufer sequency it's a serie or linear application used to label each posible permutation of a set of numbers, beign each identification unique; so, you can define your Prufer sequency for [a0, a1, a2, ..., an-1] as, for example
S = [a0,a1, ..., an-1], S in N
S -Prufer-> S x S
0 --------> [a0, a0]
1 --------> [a0, a1]
...
n-1 ------> [a0, an-1]
n --------> [a1, a0]
n+1 ------> [a1, a1]
...
2n-1 -----> [a1, an-1]
2n -------> [a2, a0]
2n+1 -----> [a2, a1]
and so on...
Then, in a general way, you can just determine permutation with index i as
P(i) = [a_sub_(i/n), a_sub(i modulus n)].
Note that this Prufer serie as example it's just a trivial one: you can define your own, keeping in mind that it must always identify the whole permutation set and as well make each identification unique (that is, a biyective linear application in basic Algebra)
Here's some JavaScript code for m69's idea:
function f(rank, arr){
var n = arr.length,
res = new Array(n - 2).fill(arr[0]),
i = n - 3;
while (rank){
res[i--] = arr[rank % n];
rank = Math.floor(rank / n);
}
return res;
}
console.log(f(64, ['a','b','c','d','e']));

How do you partition an array into 2 parts such that the two parts have equal average?

How do you partition an array into 2 parts such that the two parts have equal average? Each partition may contain elements that are non-contiguous in the array.
The only algorithm I can think of is exponential can we do better?
You can reduce this problem to the sum-subset problem - also cached here. Here's the idea.
Let A be the array. Compute S = A[0] + ... + A[N-1], where N is the length of A. For k from 1 to N-1, let T_k = S * k / N. If T_k is an integer, then find a subset of A of size k that sums to T_k. If you can do this, then you're done. If you cannot do this for any k, then no such partitioning exists.
Here's the math behind this approach. Suppose there is a partitioning of A such that the two parts have the same average, says X of size x and Y of size y are the partitions, where x+y = N. Then you must have
sum(X)/x = sum(Y)/y = (sum(A)-sum(X)) / (N-x)
so a bit of algebra gives
sum(X) = sum(A) * x / N
Since the array contains integers, the left hand side is an integer, so the right hand side must be as well. This motivates the constraint that T_k = S * k / N must be an integer. The only remaining part is to realize T_k as the sum of a subset of size k.

Maximum subset sum with two arrays

I am not even sure if this can be done in polynomial time.
Problem:
Given two arrays of real numbers,
A = (a[1], a[2], ..., a[n]),
B = (b[1], b[2], ..., b[n]), (b[j] > 0, j = 1, 2, ..., n)
and a number k, find a subset A' of A (A' = (a[i(1)],
a[i(2)], ..., a[i(k)])), which contains exactly k elements, such that, (sum a[i(j)])/(sum b[i(j)]) is maximized, wherej = 1, 2, ..., k.
For example, if k == 3, and {a[1], a[5], a[7]} is the result, then
(a[1] + a[5] + a[7])/(b[1] + b[5] + b[7])
should be larger than any other combination. Any clue?
Assuming that the entries of B are positive (it sounds as though this special case might be useful to you), there is an O(n^2 log n) algorithm.
Let's first solve the problem of deciding, for a particular t, whether there exists a solution such that
(sum a[i(j)])/(sum b[i(j)]) >= t.
Clearing the denominator, this condition is equivalent to
sum (a[i(j)] - t*b[i(j)]) >= 0.
All we have to do is choose the k largest values of a[i(j)] - t*b[i(j)].
Now, in order to solve the problem when t is unknown, we use a kinetic algorithm. Think of t as being a time variable; we are interested in the evolution of a one-dimensional physical system with n particles having initial positions A and velocities -B. Each particle crosses each other particle at most one time, so the number of events is O(n^2). In between crossings, the optimum of sum (a[i(j)] - t*b[i(j)]) changes linearly, because the same subset of k is optimal.
If B can contain negative numbers, then this is NP-Hard.
Because of the NP-Hardness of this problem:
Given k and array B, is there a subset of size k of B which sums to zero.
The A becomes immaterial in that case.
Of course, from your comment it seems like B must contain positive numbers.

Efficient algorithm for shortest distance between two line segments in 1D

I can find plenty formulas for finding the distance between two skew lines. I want to calculate the distance between two line segments in one dimension.
It's easy to do with a bunch of IF statements. But I was wondering if their is a more efficient math formula.
E.g. 1:
----L1x1-------L2x1-------L1x2------L2x2----------------------------
L1 = line segment 1, L2 = line segment 2;
the distance here is 0 because of intersection
E.g. 2:
----L1x1-------L1x2-------L2x1------L2x2----------------------------
the distance here is L2x1 - L1x2
EDIT:
The only assumption is that the line segments are ordered, i.e. x2 is always > x1.
Line segment 1 may be to the left, right, equal to etc. of line segment 2. The algorithm has to solve for this.
EDIT 2:
I have to implement this in T-SQL (SQL Server 2008). I just need the logic... I can write the T-SQL.
EDIT 3:
If a line segment is a line segment of the other line, the distance is 0.
----L1x1-------L2x1-------L2x2------L1x2----------------------------
Line segment 2 is a segment of line segment 1, making the distance 0.
If they intersect or touch, the distance is 0.
This question is the same as the question "Do two ranges intersect, and if not then what is the distance between them?" The answer depends slightly on whether you already know which range is smallest already, and whether the points in the ranges are ordered correctly (that is, whether the lines have the same direction).
if (a.start < b.start) {
first = a;
second = b;
} else {
first = b;
second = a;
}
Then:
distance = max(0, second.start - first.end);
Depending on where you're running this, your compiler should optimise it nicely. In any case, you should probably profile to make sure that your code is a bottleneck before making it less readable for a theoretical performance improvement.
This works in all cases:
d = (s1 max s2 - e1 min e2) max 0
As a bonus, removing max 0 means a negative result indicates exactly how much of the two segments overlap.
Proof
Note that the algorithm is symmetric, so asymmetric cases only need to covered once. So I'm going to assert s2 >= s1 w.l.o.g. Also note e1 >= s1 and e2 >= s2.
Cases:
L2 starts after L1 ends (s2 >= e1): s1 max s2 = s2, e1 min e2 = e1. Result is s2 - e1, which is non-negative and clearly the value we want (the distance).
L2 inside L1 (s2 <= e1, e2 <= e1): s1 max s2 = s2, e1 min e2 = e2. s2 - e2 is non-positive by s2 <= e2, so the result is 0 as expected during overlap.
L2 starts within L1 but ends after (s2 <= e1, e2 >= e1): s1 max s2 = s2, e1 min e2 = e1. s2 - e1 is non-positive by s2 <= e1, so the result is 0 as expected during overlap.
I do not think there is a way around the conditions. But this is succinct:
var diff1 = L2x1 - L1x2;
var diff2 = L2x2 - L1x1;
return diff1 > 0 ? max(0, diff1) : -min(0,diff2);
This assumes LNx1 < LNx2.
I think since all line segments in the 1D is one of form (X,0) or (0,Y)
so store all these x values in a array and sort the array and minimum distance will the differece between 1st 2 elemenst of the array.
Here you need to be careful while storing element in the array so that duplicate elemenst are not stored
This formula seems to work in all cases but the one where one line lies fully on the other line.
return -min(a2-b1,b2-a1)

Resources