Coding the mathematical approach for finding index of a permutation that has repetition - permutation

How to calculate the index of an element in the list of strings sorted accordingly to the input alphabet having a given length and a given number of distinct characters.
from itertools import product
def bruteforce_item_index(item, alphabet, length, distinct):
skipped=0
for u in product(alphabet, repeat=length):
v = ''.join(u)
if v == item:
return skipped
if len(set(u)) == distinct:
skipped += 1
As an example
bruteforce_item_index('0123456777', alphabet='0123456789', length=10, distinct=8)
Runs in ~1 minute and gives the answer 8245410. The run time here is proportional to the index of the given item.
I want an efficient implementation that is able to calculate that index in a fraction of second.
In other words: How to solve this problem? A mathematical approach has been provided on the same page. I want a python or java or c# code as a solution.

In this answer I will explain how to get to a function that will enable you to get the index of an element in the sequence as follows
print("Item 3749832414 is at (0-based) index %d" %
item_index('3749832414', alphabet='0123456789', length=10, distinct=8))
print("Item 7364512193 is at (0-based) index %d" %
item_index('7364512193', alphabet='0123456789', length=10, distinct=8))
> Item 3749832414 is at (0-based) index 508309342
> Item 7364512193 is at (0-based) index 1005336982
Enumeration method
By the nature of your problem it is interesting to solve it in a recursive manner, adding digits one by one and keeping track of the number of digits used. Python provide iterators so that you can produce items one by one without storing the whole sequence.
Basically all the items can be arranged in a prefix tree, and we walk the three yielding the leaf nodes.
def iter_seq(alphabet, length, distinct, prefix=''):
if distinct < 0:
# the prefix used more than the allowed number of distinct digits
return
if length == 0:
# if distinct > 0 it means that prefix did not use
# enought distinct digits
if distinct == 0:
yield prefix
else:
for d in alphabet:
if d in prefix:
# the number of distinct digits in prefix + d is the same
# as in prefix.
yield from iter_seq(alphabet, length-1, distinct, prefix + d)
else:
# the number of distinct digits in prefix + d is one more
# than the distinct digits in prefix.
yield from iter_seq(alphabet, length-1, distinct-1, prefix + d)
Let's test it with examples that can be visualized
list(iter_seq('0123', 5, 1))
['00000', '11111', '22222', '33333']
import numpy as np
np.reshape(list(iter_seq('0123', 4, 2)), (12, 7))
array([['0001', '0002', '0003', '0010', '0011', '0020', '0022'],
['0030', '0033', '0100', '0101', '0110', '0111', '0200'],
['0202', '0220', '0222', '0300', '0303', '0330', '0333'],
['1000', '1001', '1010', '1011', '1100', '1101', '1110'],
['1112', '1113', '1121', '1122', '1131', '1133', '1211'],
['1212', '1221', '1222', '1311', '1313', '1331', '1333'],
['2000', '2002', '2020', '2022', '2111', '2112', '2121'],
['2122', '2200', '2202', '2211', '2212', '2220', '2221'],
['2223', '2232', '2233', '2322', '2323', '2332', '2333'],
['3000', '3003', '3030', '3033', '3111', '3113', '3131'],
['3133', '3222', '3223', '3232', '3233', '3300', '3303'],
['3311', '3313', '3322', '3323', '3330', '3331', '3332']],
dtype='<U4')
Counting items
As you noticed by your previous question, the number of items in a sequence only depends on the length of each string, the size of the alphabet, and the number of distinct symbols.
If we look to the loop of the above function, we only have two cases, (1) the current digit is in the prefix, (2) the digit is not in the prefix. The number of times the digit will be in the prefix is exactly the number of distinct digits in the prefix. So we can add an argument used to keep track of the number of digits already used instead of the actual prefix. Now the complexity goes from O(length!) to O(2**length).
Additionally we use a lru_cache decorator that will memorize the values and return them without calling the function if the arguments are repetaed, this makes the function to run in O(length**2) time and space.
from functools import lru_cache
#lru_cache
def count_seq(n_symbols, length, distinct, used=0):
if distinct < 0:
return 0
if length == 0:
return 1 if distinct == 0 else 0
else:
return \
count_seq(n_symbols, length-1, distinct-0, used+0) * used + \
count_seq(n_symbols, length-1, distinct-1, used+1) * (n_symbols - used)
We can that it is consistent with iter_seq
assert(sum(1 for _ in iter_seq('0123', 4, 2)) == count_seq(4, 4, 2))
We can also test that it aggrees with the example you calculated by hand
assert(count_seq(10, 10, 8) == 1360800000)
Item at index
This part is not necessary to get the final answer but it is a good exercise. Furthermore it will give us a way to compute larger sequences that would be tedious by hand.
This could be achieved by iterating iter_seq the given number of times. This function does that more efficiently by comparing the number of leaves in a given subtree (number of items produced by a specific call) with the distance to the requested index. If the requested index is distant more than the number of items produced by a call it means we can skip that call at all, and jump directly to the next sibling in the tree.
def item_at(idx, alphabet, length, distinct, used=0, prefix=''):
if distinct < 0:
return
if length == 0:
return prefix
else:
for d in alphabet:
if d in prefix:
branch_count = count_seq(len(alphabet),
length-1, distinct, used)
if branch_count <= idx:
idx -= branch_count
else:
return item_at(idx, alphabet,
length-1, distinct, used, prefix + d)
else:
branch_count = count_seq(len(alphabet),
length-1, distinct-1, used+1)
if branch_count <= idx:
idx -= branch_count
else:
return item_at(idx, alphabet,
length-1, distinct-1, used+1, prefix + d)
We can test that it is consistent with iter_seq
for i, w in enumerate(iter_seq('0123', 4, 2)):
assert w == item_at(i, '0123', 4, 2)
Index of a given item
Remembering that we are walking in a prefix tree, given a string we can walk directly to the desired node. The way to find the index is to sum the size of all the subtrees that are left behind on this path.
def item_index(item, alphabet, length, distinct, used=0, prefix=''):
if distinct < 0:
return 0
if length == 0:
return 0
else:
offset = 0
for d in alphabet:
if d in prefix:
if d == item[0]:
return offset + item_index(item[1:], alphabet,
length-1, distinct, used, prefix + d)
else:
offset += count_seq(len(alphabet),
length-1, distinct, used)
else:
if d == item[0]:
return offset + item_index(item[1:], alphabet,
length-1, distinct-1, used+1, prefix + d)
else:
offset += count_seq(len(alphabet),
length-1, distinct-1, used+1)
And again we can test the consistency between this and iter_seq
for i,w in enumerate(iter_seq('0123', 4, 2)):
assert i == item_index(w, '0123', 4, 2)
Or to query for the example numbers you gave as I promised in the beginning of the post
print("Item 3749832414 is at (0-based) index %d" %
item_index('3749832414', alphabet='0123456789', length=10, distinct=8))
print("Item 7364512193 is at (0-based) index %d" %
item_index('7364512193', alphabet='0123456789', length=10, distinct=8))
> Item 3749832414 is at (0-based) index 508309342
> Item 7364512193 is at (0-based) index 1005336982
Bonus: Larger sequences
Let's calculate the index of UCP3gzjGPMwjYbYtsFu2sDHRE14XTu8AdaWoJPOm50YZlqI6skNyfvEShdmGEiB0
in the sequences of length 64 and 50 distinct symbols
item_index('UCP3gzjGPMwjYbYtsFu2sDHRE14XTu8AdaWoJPOm50YZlqI6skNyfvEShdmGEiB0',
alphabet='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
distinct=50, length=64)
Surprisingly it is 10000...000 = 10**110. How could I find that particular string??

If we choose 3 symbols from the set {a, b, c, d, e, f}, there are 20 possible combinations. We can record these combinations in a integer such as:
{a, b, c} => 1
{a, b, d} => 2
{a, b, e} => 3
...
{d, e, f} => 20
Then after we finish choosing 3 symbols from the set, there will have 3^6 possible permutations. Then we can represent it in 12 bits.
Take {a, b, c} for example, the representation can be:
aaaaaa => 00 00 00 00 00 00
aaaaab => 00 00 00 00 00 01
aaaaac => 00 00 00 00 00 10
...
cccccb => 10 10 10 10 10 01
cccccc => 10 10 10 10 10 10
Then you can use the combination of one integer and 12 bits binaries to index your permutation.

You can try Factorial number system. This is pretty complicated to explain, but it helps you to solve the problem with O(1) time. This is Project Euler's Lexicographic permutations.
It finds a permutation by its index. Probably you can rewrite it by finding an index by permutation.
public static String lexicographicPermutation(String str, int n) {
long[] fact = new long[str.length()];
List<Character> letters = new ArrayList<>(str.length());
for (int i = 0; i < str.length(); fact[i] = i == 0 ? 1 : i * fact[i - 1], i++)
letters.add(str.charAt(i));
letters.sort(Comparator.naturalOrder());
n--;
StringBuilder buf = new StringBuilder(str.length());
for (int i = str.length() - 1; i >= 0; n %= fact[i], i--)
buf.append(letters.remove((int)(n / fact[i])));
return buf.toString();
}

Related

Minimize (firstA_max - firstA_min) + (secondB_max - secondB_min)

Given n pairs of integers. Split into two subsets A and B to minimize sum(maximum difference among first values of A, maximum difference among second values of B).
Example : n = 4
{0, 0}; {5;5}; {1; 1}; {3; 4}
A = {{0; 0}; {1; 1}}
B = {{5; 5}; {3; 4}}
(maximum difference among first values of A, maximum difference among second values of B).
(maximum difference among first values of A) = fA_max - fA_min = 1 - 0 = 1
(maximum difference among second values of B) = sB_max - sB_min = 5 - 4 = 1
Therefore, the answer if 1 + 1 = 2. And this is the best way.
Obviously, maximum difference among the values equals to (maximum value - minimum value). Hence, what we need to do is find the minimum of (fA_max - fA_min) + (sB_max - sB_min)
Suppose the given array is arr[], first value if arr[].first and second value is arr[].second.
I think it is quite easy to solve this in quadratic complexity. You just need to sort the array by the first value. Then all the elements in subset A should be picked consecutively in the sorted array. So, you can loop for all ranges [L;R] of the sorted. Each range, try to add all elements in that range into subset A and add all the remains into subset B.
For more detail, this is my C++ code
int calc(pair<int, int> a[], int n){
int m = 1e9, M = -1e9, res = 2e9; //m and M are min and max of all the first values in subset A
for (int l = 1; l <= n; l++){
int g = m, G = M; //g and G are min and max of all the second values in subset B
for(int r = n; r >= l; r--) {
if (r - l + 1 < n){
res = min(res, a[r].first - a[l].first + G - g);
}
g = min(g, a[r].second);
G = max(G, a[r].second);
}
m = min(m, a[l].second);
M = max(M, a[l].second);
}
return res;
}
Now, I want to improve my algorithm down to loglinear complexity. Of course, sort the array by the first value. After that, if I fixed fA_min = a[i].first, then if the index i increase, the fA_max will increase while the (sB_max - sB_min) decrease.
But now I am still stuck here, is there any ways to solve this problem in loglinear complexity?
The following approach is an attempt to escape the n^2, using an argmin list for the second element of the tuples (lets say the y-part). Where the points are sorted regarding x.
One Observation is that there is an optimum solution where A includes index argmin[0] or argmin[n-1] or both.
in get_best_interval_min_max we focus once on including argmin[0] and the next smallest element on y and so one. The we do the same from the max element.
We get two dictionaries {(i,j):(profit, idx)}, telling us how much we gain in y when including points[i:j+1] in A, towards min or max on y. idx is the idx in the argmin array.
calculate the objective for each dict assuming max/min or y is not in A.
combine the results of both dictionaries, : (i1,j1): (v1, idx1) and (i2,j2): (v2, idx2). result : j2 - i1 + max_y - min_y - v1 - v2.
Constraint: idx1 < idx2. Because the indices in the argmin array can not intersect, otherwise some profit in y might be counted twice.
On average the dictionaries (dmin,dmax) are smaller than n, but in the worst case when x and y correlate [(i,i) for i in range(n)] they are exactly n, and we do not win any time. Anyhow on random instances this approach is much faster. Maybe someone can improve upon this.
import numpy as np
from random import randrange
import time
def get_best_interval_min_max(points):# sorted input according to x dim
L = len(points)
argmin_b = np.argsort([p[1] for p in points])
b_min,b_max = points[argmin_b[0]][1], points[argmin_b[L-1]][1]
arg = [argmin_b[0],argmin_b[0]]
res_min = dict()
for i in range(1,L):
res_min[tuple(arg)] = points[argmin_b[i]][1] - points[argmin_b[0]][1],i # the profit in b towards min
if arg[0] > argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1] < argmin_b[i]: arg[1]=argmin_b[i]
arg = [argmin_b[L-1],argmin_b[L-1]]
res_max = dict()
for i in range(L-2,-1,-1):
res_max[tuple(arg)] = points[argmin_b[L-1]][1]-points[argmin_b[i]][1],i # the profit in b towards max
if arg[0]>argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1]<argmin_b[i]: arg[1]=argmin_b[i]
# return the two dicts, difference along y,
return res_min, res_max, b_max-b_min
def argmin_algo(points):
# return the objective value, sets A and B, and the interval for A in points.
points.sort()
# get the profits for different intervals on the sorted array for max and min
dmin, dmax, y_diff = get_best_interval_min_max(points)
key = [None,None]
res_min = 2e9
# the best result when only the min/max b value is includes in A
for d in [dmin,dmax]:
for k,(v,i) in d.items():
res = points[k[1]][0]-points[k[0]][0] + y_diff - v
if res < res_min:
key = k
res_min = res
# combine the results for max and min.
for k1,(v1,i) in dmin.items():
for k2,(v2,j) in dmax.items():
if i > j: break # their argmin_b indices can not intersect!
idx_l, idx_h = min(k1[0], k2[0]), max(k1[1],k2[1]) # get index low and idx hight for combination
res = points[idx_h][0]-points[idx_l][0] -v1 -v2 + y_diff
if res < res_min:
key = (idx_l, idx_h) # new merged interval
res_min = res
return res_min, points[key[0]:key[1]+1], points[:key[0]]+points[key[1]+1:], key
def quadratic_algorithm(points):
points.sort()
m, M, res = 1e9, -1e9, 2e9
idx = (0,0)
for l in range(len(points)):
g, G = m, M
for r in range(len(points)-1,l-1,-1):
if r-l+1 < len(points):
res_n = points[r][0] - points[l][0] + G - g
if res_n < res:
res = res_n
idx = (l,r)
g = min(g, points[r][1])
G = max(G, points[r][1])
m = min(m, points[l][1])
M = max(M, points[l][1])
return res, points[idx[0]:idx[1]+1], points[:idx[0]]+points[idx[1]+1:], idx
# let's try it and compare running times to the quadratic_algorithm
# get some "random" points
c1=0
c2=0
for i in range(100):
points = [(randrange(100), randrange(100)) for i in range(1,200)]
points.sort() # sorted for x dimention
s = time.time()
r1 = argmin_algo(points)
e1 = time.time()
r2 = quadratic_algorithm(points)
e2 = time.time()
c1 += (e1-s)
c2 += (e2-e1)
if not r1[0] == r2[0]:
print(r1,r2)
raise Exception("Error, results are not equal")
print("time of argmin_algo", c1, "time of quadratic_algorithm",c2)
UPDATE: #Luka proved the algorithm described in this answer is not exact. But I will keep it here because it's a good performance heuristics and opens the way to many probabilistic methods.
I will describe a loglinear algorithm. I couldn't find a counter example. But I also couldn't find a proof :/
Let set A be ordered by first element and set B be ordered by second element. They are initially empty. Take floor(n/2) random points of your set of points and put in set A. Put the remaining points in set B. Define this as a partition.
Let's call a partition stable if you can't take an element of set A, put it in B and decrease the objective function and if you can't take an element of set B, put it in A and decrease the objective function. Otherwise, let's call the partition unstable.
For an unstable partition, the only moves that are interesting are the ones that take the first or the last element of A and move to B or take the first or the last element of B and move to A. So, we can find all interesting moves for a given unstable partition in O(1). If an interesting move decreases the objective function, do it. Go like that until the partition becomes stable. I conjecture that it takes at most O(n) moves for the partition to become stable. I also conjecture that at the moment the partition becomes stable, you will have a solution.

Find Minimum Score Possible

Problem statement:
We are given three arrays A1,A2,A3 of lengths n1,n2,n3. Each array contains some (or no) natural numbers (i.e > 0). These numbers denote the program execution times.
The task is to choose the first element from any array and then you can execute that program and remove it from that array.
For example:
if A1=[3,2] (n1=2),
A2=[7] (n2=1),
A3=[1] (n3=1)
then we can execute programs in various orders like [1,7,3,2] or [7,1,3,2] or [3,7,1,2] or [3,1,7,2] or [3,2,1,7] etc.
Now if we take S=[1,3,2,7] as the order of execution the waiting time of various programs would be
for S[0] waiting time = 0, since executed immediately,
for S[1] waiting time = 0+1 = 1, taking previous time into account, similarly,
for S[2] waiting time = 0+1+3 = 4
for S[3] waiting time = 0+1+3+2 = 6
Now the score of array is defined as sum of all wait times = 0 + 1 + 4 + 6 = 11, This is the minimum score we can get from any order of execution.
Our task is to find this minimum score.
How can we solve this problem? I tried with approach trying to pick minimum of three elements each time, but it is not correct because it gets stuck when two or three same elements are encountered.
One more example:
if A1=[23,10,18,43], A2=[7], A3=[13,42] minimum score would be 307.
The simplest way to solve this is with dynamic programming (which runs in cubic time).
For each array A: Suppose you take the first element from array A, i.e. A[0], as the next process. Your total cost is the wait-time contribution of A[0] (i.e., A[0] * (total_remaining_elements - 1)), plus the minimal wait time sum from A[1:] and the rest of the arrays.
Take the minimum cost over each possible first array A, and you'll get the minimum score.
Here's a Python implementation of that idea. It works with any number of arrays, not just three.
def dp_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
arrays = [x for x in arrays if len(x) > 0] # Remove empty
#functools.lru_cache(100000)
def dp(remaining_elements: Tuple[int],
total_remaining: int) -> int:
"""Returns minimum wait time sum when suffixes of each array
have lengths in 'remaining_elements' """
if total_remaining == 0:
return 0
rem_elements_copy = list(remaining_elements)
best = 10 ** 20
for i, x in enumerate(remaining_elements):
if x == 0:
continue
cost_here = arrays[i][-x] * (total_remaining - 1)
if cost_here >= best:
continue
rem_elements_copy[i] -= 1
best = min(best,
dp(tuple(rem_elements_copy), total_remaining - 1)
+ cost_here)
rem_elements_copy[i] += 1
return best
return dp(tuple(map(len, arrays)), sum(map(len, arrays)))
Better solutions
The naive greedy strategy of 'smallest first element' doesn't work, because it can be worth it to do a longer job to get a much shorter job in the same list done, as the example of
A1 = [100, 1, 2, 3], A2 = [38], A3 = [34],
best solution = [100, 1, 2, 3, 34, 38]
by user3386109 in the comments demonstrates.
A more refined greedy strategy does work. Instead of the smallest first element, consider each possible prefix of the array. We want to pick the array with the smallest prefix, where prefixes are compared by average process time, and perform all the processes in that prefix in order.
A1 = [ 100, 1, 2, 3]
Prefix averages = [(100)/1, (100+1)/2, (100+1+2)/3, (100+1+2+3)/4]
= [ 100.0, 50.5, 34.333, 26.5]
A2=[38]
A3=[34]
Smallest prefix average in any array is 26.5, so pick
the prefix [100, 1, 2, 3] to complete first.
Then [34] is the next prefix, and [38] is the final prefix.
And here's a rough Python implementation of the greedy algorithm. This code computes subarray averages in a completely naive/brute-force way, so the algorithm is still quadratic (but an improvement over the dynamic programming method). Also, it computes 'maximum suffixes' instead of 'minimum prefixes' for ease of coding, but the two strategies are equivalent.
def greedy_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
def max_suffix_avg(arr: List[int]):
"""Given arr, return value and length of max-average suffix"""
if len(arr) == 0:
return (-math.inf, 0)
best_len = 1
best = -math.inf
curr_sum = 0.0
for i, x in enumerate(reversed(arr), 1):
curr_sum += x
new_avg = curr_sum / i
if new_avg >= best:
best = new_avg
best_len = i
return (best, best_len)
arrays = [x for x in arrays if len(x) > 0] # Remove empty
total_time_sum = sum(sum(x) for x in arrays)
my_averages = [max_suffix_avg(arr) for arr in arrays]
total_cost = 0
while True:
largest_avg_idx = max(range(len(arrays)),
key=lambda y: my_averages[y][0])
_, n_to_remove = my_averages[largest_avg_idx]
if n_to_remove == 0:
break
for _ in range(n_to_remove):
total_time_sum -= arrays[largest_avg_idx].pop()
total_cost += total_time_sum
# Recompute the changed array's avg
my_averages[largest_avg_idx] = max_suffix_avg(arrays[largest_avg_idx])
return total_cost

How to translate a solution into divide-and-conquer (finding a sub array with the largest, smallest value)

I am trying to get better at divide an conquer algorithms and am using this one below as an example. Given an array _in and some length l it finds the start point of a sub array _in[_min_start,_min_start+l] such that the lowest value in that sub array is the highest it could possible be. I have come up with a none divide and conquer solution and am wondering how I could go about translating this into one which divides the array up into smaller parts (divide-and-conquer).
def main(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start, _in[_min_start:_min_start+l]
e.g. For the array [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6] and a sub array of lenght 3 it would return start position 6 (resulting in the array [3,9,8]).
Three O(n) solutions and a benchmark
Note I'm renaming _in and l to clearer-looking names A and k.
Solution 1: Divide and conquer
Split the array in half. Solve left half and right half recursively. The subarrays not yet considered cross the middle, i.e., they're a suffix of the left part plus a prefix of the right part. Compute k-1 suffix-minima of the left half and k-1 prefix-minima of the right half. That allows you to compute the minimum for each middle-crossing subarray of length k in O(1) time each. The best subarray for the whole array is the best of left-best, right-best and crossing-best.
Runtime is O(n), I believe. As Ellis pointed out, in the recursion the subarray can become smaller than k. Such cases take O(1) time to return the equivalent of "there aren't any k-length subarrays in here". So the time is:
T(n) = { 2 * T(n/2) + O(k) if n >= k
{ O(1) otherwise
For any 0 <= k <= n we have k=nc with 0 <= c <= 1. Then the number of calls is Θ(n1-c) and each call's own work takes Θ(nc) time, for a total of Θ(n) time.
Posted a question about the complexity to be sure.
Python implementation:
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
Solution 2: k-Blocks
As commented by #benrg, the above dividing-and-conquering is needlessly complicated. We can simply work on blocks of length k. Compute the suffix minima of the first block and the prefix minima of the second block. That allows finding the minimum of each k-length subarray within these two blocks in O(1) time. Do the same with the second and third block, the third and fourth block, etc. Time is O(n) as well.
Python implementation:
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
Solution 3: Monoqueue
Not divide & conquer, but first one I came up with (and knew was O(n)).
Sliding window, represent the window with a deque of (sorted) indexes of strictly increasing array values in the window. When sliding the window to include a new value A[i]:
Remove the first index from the deque if the sliding makes it fall out of the window.
Remove indexes whose array values are larger than A[i]. (They can never be the minimum of the window anymore.)
Include the new index i.
The first index still in the deque is the index of the current window's minimum value. Use that to update overall result.
Python implementation:
from collections import deque
A = [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6]
k = 3
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
print(result)
Benchmark
With 4000 numbers from the range 0 to 9999, and k=2000:
80.4 ms 81.4 ms 81.8 ms solve_brute_force
80.2 ms 80.5 ms 80.7 ms solve_original
2.4 ms 2.4 ms 2.4 ms solve_monoqueue
2.4 ms 2.4 ms 2.4 ms solve_divide_and_conquer
1.3 ms 1.4 ms 1.4 ms solve_blocks
Benchmark code (Try it online!):
from timeit import repeat
from random import choices
from itertools import accumulate
from math import inf
from itertools import count
from collections import deque
def solve_monoqueue(A, k):
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
return result
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
def solve_brute_force(A, k):
return max(range(len(A)+1-k),
key=lambda start: min(A[start : start+k]))
def solve_original(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start # , _in[_min_start:_min_start+l]
solutions = [
solve_brute_force,
solve_original,
solve_monoqueue,
solve_divide_and_conquer,
solve_blocks,
]
for _ in range(3):
A = choices(range(10000), k=4000)
k = 2000
# Check correctness
expect = None
for solution in solutions:
index = solution(A.copy(), k)
assert 0 <= index and index + k-1 < len(A)
min_there = min(A[index : index+k])
if expect is None:
expect = min_there
print(expect)
else:
print(min_there == expect, solution.__name__)
print()
# Speed
for solution in solutions:
copy = A.copy()
ts = sorted(repeat(lambda: solution(copy, k), number=1))[:3]
print(*('%5.1f ms ' % (t * 1e3) for t in ts), solution.__name__)
print()

Rank and unrank combinations to distribute k balls into n bins of different capacities

I want to distribute k balls into n bins of different capacities. How can I rank and unrank the distributions given n, k, and the bin capacities?
Example:
n := 3
k := 4
bin capacities := 3,2,1
Balls in bins:
1,2,1, 2,1,1, 2,2,0, 3,0,1, 3,1,0 := 5
Is there a formula?
I do not know if there is a standard name for this technique, but this is a kind of problem that I have successfully solved many times with a twist on dynamic programming.
What I do using dynamic programming to build a data structure from which the rank/unrank can happen, and then build logic to do the rank/unrank thing.
The dynamic programming piece is hardest.
import collections
BallSolutions = collections.namedtuple('BallSolutions', 'bin count balls next_bin_solutions next_balls_solutions');
def find_ball_solutions (balls, bin_capacities):
# How many balls can fit in the remaining bins?
capacity_sum = [0 for _ in bin_capacities]
capacity_sum[-1] = bin_capacities[-1]
for i in range(len(bin_capacities) - 2, -1, -1):
capacity_sum[i] = capacity_sum[i+1] + bin_capacities[i]
cache = {}
def _search (bin_index, remaining_balls):
if len(bin_capacities) <= bin_index:
return None
elif capacity_sum[bin_index] < remaining_balls:
return None
elif (bin_index, remaining_balls) not in cache:
if bin_index + 1 == len(bin_capacities):
cache[(bin_index, remaining_balls)] = BallSolutions(
bin=bin_index, count=1, balls=remaining_balls, next_bin_solutions=None, next_balls_solutions=None)
else:
this_solution = None
for this_balls in range(min([remaining_balls, bin_capacities[bin_index]]), -1, -1):
next_bin_solutions = _search(bin_index+1, remaining_balls - this_balls)
if next_bin_solutions is None:
break # We already found the fewest balls that can go in this bin.
else:
this_count = next_bin_solutions.count
if this_solution is not None:
this_count = this_count + this_solution.count
next_solution = BallSolutions(
bin=bin_index, count=this_count,
balls=this_balls, next_bin_solutions=next_bin_solutions,
next_balls_solutions=this_solution)
this_solution = next_solution
cache[(bin_index, remaining_balls)] = this_solution
return cache[(bin_index, remaining_balls)]
return _search(0, balls)
Here is code to produce a ranked solution:
def find_ranked_solution (solutions, n):
if solutions is None:
return None
elif n < 0:
return None
elif solutions.next_bin_solutions is None:
if n == 0:
return [solutions.balls]
else:
return None
elif n < solutions.next_bin_solutions.count:
return [solutions.balls] + find_ranked_solution(solutions.next_bin_solutions, n)
else:
return find_ranked_solution(solutions.next_balls_solutions, n - solutions.next_bin_solutions.count)
Here is code to produce the rank for a solution. Note that it will blow up if provided with an invalid answer.
def find_solution_rank (solutions, solution):
n = 0
while solutions.balls < solution[0]:
n = n + solutions.next_bin_solutions.count
solutions = solutions.next_balls_solutions
if 1 < len(solution):
n = n + find_solution_rank(solutions.next_bin_solutions, solution[1:])
return n
And here is some test code:
s = find_ball_solutions(4, [3, 2, 1])
for i in range(6):
r = find_ranked_solution(s, i)
print((i, r, find_solution_rank(s, r)))
You can define the number of such combinations recursively. Given k balls and bin capacities q_1, ..., q_n, for each j between 0 andq_1, place j balls in q_1 and allocate the remaining k-j balls among other bins.
Here is a quick Python implementation:
from functools import lru_cache
#lru_cache(None)
def f(n, *qs):
if not qs:
return 1 if n == 0 else 0
q = qs[0]
return sum(f(n-j, *qs[1:]) for j in range(q+1))
f(4, 3, 2, 1)
# 5
Here's a way (in pseudocode), though it doesn't look very efficient. It would probably be smart to add some short-circuiting in places where the number of balls won't fit in the total remaining capacity. Perhaps some clever caching could help, if a given list of capacities will be used many times.
All numbers are non-negative integers. Function ArrayTail(array a) is the subarray whose elements are all elements of the input array after the first. Function ArrayCon(number head, array a) is the array whose elements are head followed by the elements of a.
function Count(array capacities, number balls) -> number
If balls == 0:
return 1
Else if capacities is empty:
return 0
Else:
Let sum: number
sum <- 0
For b from 0 to max(balls, capacities[0]):
sum <- sum + Count(ArrayTail(capacities), b)
End For
return sum
End If/Else
End function
function Rank(array capacities, array counts) -> number
Precondition: length(capacities) == length(counts)
Precondition: counts[i] <= capacities[i] for all i < length(counts)
If counts is empty:
return 0
Else:
Let total: number
total <- 0
For c in counts:
total <- total + c
End For
Let r: number
r <- Rank(ArrayTail(capacities), ArrayTail(counts))
For b from 0 to (counts[0]-1):
r <- r + Count(ArrayTail(capacities), total - b)
End For
return r
End If/Else
End function
function Unrank(array capacities, number balls, number rank) -> array
Precondition: rank < Count(capacities, balls)
If capacities is empty:
return empty array
Else
Let c0: number
c0 <- 0
Loop until "return":
Let subcount: number
subcount <- Count(ArrayTail(capacities), balls-c0)
If subcount <= rank:
c0 <- c0 + 1
rank <- rank - subcount
Else
return ArrayCon(c0, Unrank(ArrayTail(capacities), balls-c0, rank))
End If/Else
End Loop
End If/Else
End function

Minimize the number of operation to make all elements of array equal

Given an array of n elements you are allowed to perform only 2 kinds of operation to make all elements of array equal.
multiply any element by 2
divide element by 2(integer division)
Your task is to minimize the total number of above operation performed to make all elements of array equal.
Example
array = [3,6,7] minimum operation is 2 as 6 and 7 can be divided by 2 to obtain 3.
I cannot think of even the brute force solution.
Constraints
1 <= n <= 100000 and
1 <= ai <=100000
where ai is the ith element of array.
View all numbers as strings of 0 and 1, via their binary expansion.
E.g.: 3, 6, 7 are represented as 11, 110, 111, respectively.
Dividing by 2 is equivalent to removing the right most 0 or 1, and multiplying by 2 is equivalent to adding a 0 from the right.
For a string consisting of 0 and 1, let us define its "head" to be a substring that is the left several terms of the string, which ends with 1.
E.g.: 1100101 has heads 1, 11, 11001, 1100101.
The task becomes finding longest common head of all the given strings, and then determining how many 0's to add after this common head.
An example:
Say you have the following strings:
10101001, 101011, 10111, 1010001
find the longest common head of 10101001 and 101011, which is 10101;
find the longest common head of 10101 and 10111, which is 101;
find the longest common head of 101 and 1010001, which is 101.
Then you are sure that all the numbers should become a number of the form 101 00....
To determine how many 0's to add after 101, find the number of consecutive 0's directly following 101 in every string:
For 10101001: 1
For 101011: 1
For 10111: 0
For 1010001: 3
It remains to find an integer k that minimizes |k - 1| + |k - 1| + |k - 0| + |k - 3|. Here we find k = 1. So every number should becomd 1010 in the end.
As the other answer explains, backtracking is not necessary. For the fun of it a little implementation of that approach. (See link to run online at the bottom):
First we need a function that determines the number of binary digits in a number:
def getLength(i: Int): Int = {
#annotation.tailrec
def rec(i: Int, result: Int): Int =
if(i > 0)
rec(i >> 1, result + 1)
else
result
rec(i, 0)
}
Then we need a function that determines the common prefix of two numbers of equal length
#annotation.tailrec
def getPrefix(i: Int, j: Int): Int =
if(i == j) i
else getPrefix(i >> 1, j >> 1)
And of a list of arbitrary numbers:
def getPrefix(is: List[Int]): Int = is.reduce((x,y) => {
val shift = Math.abs(getLength(x) - getLength(y))
val x2 = Math.max(x,y)
val y2 = Math.min(x,y)
getPrefix((x2 >> shift), y2)
})
Then we need the length of the suffix without counting leeding zeros of the suffix:
def getSuffixLength(i: Int, prefix: Int) = {
val suffix = i ^ (prefix << (getLength(i) - getLength(prefix)))
getLength(suffix)
}
Now we can compute the number of operations we need to synchronize an operation i to the prefix with "zeros" zeros appended.
def getOperations(i: Int, prefix: Int, zeros: Int): Int = {
val length = getLength(i) - getLength(prefix)
val suffixLength = getSuffixLength(i, prefix)
suffixLength + Math.abs(zeros - length + suffixLength)
}
Now we can find the minimal numbers of operations and return that together with the value we will sync to:
def getMinOperations(is: List[Int]) = {
val prefix = getPrefix(is)
val maxZeros = getLength(is.max) - getLength(prefix)
(0 to maxZeros).map{zeros => (is.map{getOperations(_, prefix, zeros)}.sum, prefix << zeros)}.minBy(_._1)
}
You can try this solution at:
http://goo.gl/lLr5jl
The last step of finding the right number of zeros can be improved, as only the length of a suffix without leading zeros matters, not what it looks like. So we can compute the number of operations we need for these together by counting how many there are:
def getSuffixLength(i: Int, prefix: Int) = {
val suffix = i ^ (prefix << (getLength(i) - getLength(prefix)))
getLength(suffix)
}
def getMinOperations(is: List[Int]) = {
val prefix = getPrefix(is)
val maxZeros = getLength(is.max) - getLength(prefix)
val baseCosts = is.map(getSuffixLength(_,prefix)).sum
val suffixLengths: List[(Int, Int)] = is.foldLeft(Map[Int, Int]()){
case (m,i) => {
val x = getSuffixLength(i,prefix) - getLength(i) + getLength(prefix)
m.updated(x, 1 + m.getOrElse(x, 0))
}
}.toList
val (minOp, minSol) = (0 to maxZeros).map{zeros => (suffixLengths.map{
case (x, count) => count * Math.abs(zeros + x)
}.sum, prefix << zeros)}.minBy(_._1)
(minOp + baseCosts, minSol)
}
All axillary operations only take logarithmic time in the size of the maximal number. We have to go through the hole list to collect the suffix lengths. And then we have to guess the number of zeros where there are at most logarithmic in the maximal number many zeros. So we should have a complexity of
O(|list|*ld(maxNum) + (ld(maxNum))^2)
So for your bounds this is basically linear in the input size.
This version can be found here:
http://goo.gl/ijzYik

Resources