How do I find a certain subset of the binomial coefficient - arrays

Let's suppose we have an array with length n. We'll call this array
keys[n] = {...}
What I am looking for is a certain array of subsets given by "n choose 3". We'll call this
combination[?][3] = {...}
This array needs to meet the following criteria:
Each subset of length 2 of the 3 keys in each element in array
combination ("3 choose 2") has to appear at least in one other
element in combination
Every key has to appear in at least one element in combination
(actually in two elements because of the previous criterium)
The length of combination has to be as small as possible (so out of
all solutions that satisfy the above two criteria, we need to pick
one with minimum length)
Optional: combination is random everytime but still at minimum length
Optional: No subset of length 2 of the 3 keys in each element in
array combination ("3 choose 2") appears particularly more often than
others.
Here's an example:
Let keys[5] = {1,2,3,4,5};
"4 choose 3" yields the following 10 subsets: {1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}, {1,4,5}, {2,3,4}, {2,4,5}, {2,3,5}, {3,4,5}
So one solution would be: {1,2,3}, {1,2,4}, {1,3,4}, {2,3,5}, {2,4,5}, {3,4,5} (at least I didn't find a shorter one)
I've been trying to solve this problem all day. I only managed to come up with one really convoluted algorithm that doesn't even work.
Does anyone know how to solve this or even just what you might call this problem?

Here is the original solution to a single cover (oops).
Some solutions are more likely than others, but it is still pretty random. It is fast if not a lot of backtracking happens. So 13, for example, runs fast. But 8 runs slowly because the shortest solution has 11 triples, and it has to fail at 10 over and over again before it accepts that.
The idea is an A* search for the shortest possible solution. With some technical details explained in the code.
Note that it found the following solution for 5: [[1, 2, 5], [1, 3, 4], [2, 3, 5], [2, 4, 5]] This is shorter than you had.
import math
import heapq
import random
def min_cover (n):
# Small helper function.
def next_pair(pair):
i, j = pair
if j < n:
return [i, j+1]
elif i+1 < n:
return [i+1, i+2]
else:
raise StopIteration
# Another small helper function.
def pair_in(pair, groups):
i, j = pair
while groups is not None:
if i in groups[0] and j in groups[0]:
return True
groups = groups[1]
return False
numbers = [_ for _ in range(1, n+1)]
# Queue will be a priority queue of:
# [
# min_pairs_at_finish,
# neg_pair,
# calculation_for_min_pairs_at_finish,
# random_to_avoid_more_comparisons,
# last_pair,
# [triple, [triple, [triple, ...[None], ...]]]
# ]
#
# The difference between min_pairs and its calculation
# is subtle. In the end, the number of pairs is a
# multiple of 3. So if we've calculated a minimum
# of 10 airs, we ACTUALLY can't finish with less than 12.
#
# The reason for the ordering is as follows.
#
# min_pairs_at_finish: We want as few as possible.
# neg_pair: Prefer continuing solutions that are far along.
# calculation_for_min_pairs_at_finish: Closer to done is better
# random_to_avoid_more_comparisons: Comparing complex data
# is slow. Don't. Bonus, this randomizes the answer!
# last_pair: Where to continue from.
# groups: The solution so far. This data structure efficiently
# reuses memory between different partial solutions.
#
queue = [[0, [-1, -1], n*(n-1)/2, 0, [1, 1], None]]
while 0 < len(queue):
min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
try:
pair = next_pair(pair)
while pair_in(pair, groups):
pair = next_pair(pair)
for k in numbers:
if k != pair[0] and k != pair[1]:
extra = 0
if pair_in([pair[0], k], groups):
extra += 1
if pair_in([pair[1], k], groups):
extra += 1
next_item = [
3 * math.ceil((min_cost + extra)/3),
[-pair[0], -pair[1]],
min_cost + extra,
random.random(),
pair,
[sorted([pair[0], pair[1], k]), groups],
]
heapq.heappush(queue, next_item)
except StopIteration:
answer = []
while groups is not None:
answer.append(groups[0])
groups = groups[1]
return list(reversed(answer))
print(min_cover(5))
queue = [[0, n*(n-1)/2, 0, [1, 1], None]]
while 0 < len(queue):
min_cost, min_calc_cost, r, pair, groups = heapq.heappop(queue)
try:
pair = next_pair(pair)
while pair_in(pair, groups):
pair = next_pair(pair)
for k in numbers:
if k != pair[0] and k != pair[1]:
extra = 0
if pair_in([pair[0], k], groups):
extra += 1
if pair_in([pair[1], k], groups):
extra += 1
next_item = [
3 * math.ceil((min_cost + extra)/3),
min_cost + extra,
random.random(),
pair,
[sorted([pair[0], pair[1], k]), groups],
]
heapq.heappush(queue, next_item)
except StopIteration:
answer = []
while groups is not None:
answer.append(groups[0])
groups = groups[1]
return list(reversed(answer))
print(min_cover(5))
And here is a solution to the double cover problem that was actually wanted using the same technique.
import math
import heapq
import random
def min_double_cover (n):
# Small helper function.
def next_pair(pair):
i, j = pair
if j < n:
return [i, j+1]
elif i+1 < n:
return [i+1, i+2]
else:
raise StopIteration
# Another small helper function.
def double_pair_in(pair, groups):
i, j = pair
answer = 0
while groups is not None:
if i in groups[0] and j in groups[0]:
answer += 1
if 2 <= answer:
return True
groups = groups[1]
return False
def triple_in(triple, groups):
i, j, k = triple
while groups is not None:
if i in groups[0] and j in groups[0] and k in groups[0]:
return True
groups = groups[1]
return False
numbers = [_ for _ in range(1, n+1)]
# Queue will be a priority queue of:
# [
# min_pairs_at_finish,
# neg_pair,
# calculation_for_min_pairs_at_finish,
# random_to_avoid_more_comparisons,
# last_pair,
# [triple, [triple, [triple, ...[None], ...]]]
# ]
#
# The difference between min_pairs and its calculation
# is subtle. In the end, the number of pairs is a
# multiple of 3. So if we've calculated a minimum
# of 10 airs, we ACTUALLY can't finish with less than 12.
#
# The reason for the ordering is as follows.
#
# min_pairs_at_finish: We want as few as possible.
# neg_pair: Prefer continuing solutions that are far along.
# calculation_for_min_pairs_at_finish: Closer to done is better
# random_to_avoid_more_comparisons: Comparing complex data
# structures is slow. Don't.
# last_pair: Where to continue from.
# groups: The solution so far. This data structure efficiently
# reuses memory between different partial solutions.
#
queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
while 0 < len(queue):
min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
try:
while double_pair_in(pair, groups):
pair = next_pair(pair)
for k in numbers:
if k != pair[0] and k != pair[1] and not triple_in([pair[0], pair[1], k], groups):
extra = 0
if double_pair_in([pair[0], k], groups):
extra += 1
if double_pair_in([pair[1], k], groups):
extra += 1
next_item = [
3 * math.ceil((min_cost + extra)/3),
[-pair[0], -pair[1]],
min_cost + extra,
random.random(),
pair,
[sorted([pair[0], pair[1], k]), groups],
]
heapq.heappush(queue, next_item)
except StopIteration:
answer = []
while groups is not None:
answer.append(groups[0])
groups = groups[1]
return list(reversed(answer))
print(min_double_cover(5))
This time I couldn't run it for 8 at all. No clue why not. But lots of other numbers are fast.
At the expense of a potentially incorrect answer, here is a change to make it finish:
...
queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
min_pairs = 3*math.ceil(n*(n-1)/3)
threshold = 1000000
next_threshold = threshold
while 0 < len(queue):
if next_threshold < len(queue):
print("Threshold reached", next_threshold)
min_pairs += 3
for x in queue:
x[0] = max(min_pairs, x[0])
heapq.heapify(queue)
next_threshold += threshold
min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
...
next_item = [
max(min_pairs, 3 * math.ceil((min_cost + extra)/3)),
[-pair[0], -pair[1]],
min_cost + extra,
random.random(),
pair,
[sorted([pair[0], pair[1], k]), groups],
]
...
And I found a bug. The following should now be correct.
The numbers where it couldn't figure it out fast, like 8, seem to be the ones where it has to backtrack to a larger number of triples. If you get the threshold message once, then the answer is possibly off by 1 but probably is right. If you get it twice, ditto.
import math
import heapq
import random
def min_double_cover (n):
# Small helper function.
def next_pair(pair):
i, j = pair
if j < n:
return [i, j+1]
elif i+1 < n:
return [i+1, i+2]
else:
raise StopIteration
# Another small helper function.
def double_pair_in(pair, groups):
i, j = pair
answer = 0
while groups is not None:
if i in groups[0] and j in groups[0]:
answer += 1
if 2 <= answer:
return True
groups = groups[1]
return False
def triple_in(triple, groups):
i, j, k = triple
while groups is not None:
if i in groups[0] and j in groups[0] and k in groups[0]:
return True
groups = groups[1]
return False
numbers = [_ for _ in range(1, n+1)]
# Queue will be a priority queue of:
# [
# min_pairs_at_finish,
# neg_pair,
# calculation_for_min_pairs_at_finish,
# random_to_avoid_more_comparisons,
# last_pair,
# [triple, [triple, [triple, ...[None], ...]]]
# ]
#
# The difference between min_pairs and its calculation
# is subtle. In the end, the number of pairs is a
# multiple of 3. So if we've calculated a minimum
# of 10 pairs, we ACTUALLY can't finish with less than 12.
#
# The reason for the ordering is as follows.
#
# min_pairs_at_finish: We want as few as possible.
# neg_pair: Prefer continuing solutions that are far along.
# calculation_for_min_pairs_at_finish: Closer to done is better
# random_to_avoid_more_comparisons: Comparing complex data
# structures is slow. Don't.
# last_pair: Where to continue from.
# groups: The solution so far. This data structure efficiently
# reuses memory between different partial solutions.
#
queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
min_pairs = 3*math.ceil(n*(n-1)/3)
threshold = 100000
next_threshold = threshold
while 0 < len(queue):
if next_threshold < len(queue):
print("Threshold reached", next_threshold)
min_pairs += 3
for x in queue:
x[0] = max(min_pairs, x[0])
heapq.heapify(queue)
next_threshold += threshold
min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
try:
while double_pair_in(pair, groups):
pair = next_pair(pair)
for k in numbers:
if k != pair[0] and k != pair[1] and not triple_in([pair[0], pair[1], k], groups):
extra = 0
if double_pair_in([pair[0], k], groups):
extra += 1
if double_pair_in([pair[1], k], groups):
extra += 1
next_item = [
max(min_pairs, 3 * math.ceil((min_calc_cost + extra)/3)),
[-pair[0], -pair[1]],
min_calc_cost + extra,
random.random(),
pair,
[sorted([pair[0], pair[1], k]), groups],
]
heapq.heappush(queue, next_item)
except StopIteration:
answer = []
while groups is not None:
answer.append(groups[0])
groups = groups[1]
return list(reversed(answer))
print(min_double_cover(8))

Related

How to translate a solution into divide-and-conquer (finding a sub array with the largest, smallest value)

I am trying to get better at divide an conquer algorithms and am using this one below as an example. Given an array _in and some length l it finds the start point of a sub array _in[_min_start,_min_start+l] such that the lowest value in that sub array is the highest it could possible be. I have come up with a none divide and conquer solution and am wondering how I could go about translating this into one which divides the array up into smaller parts (divide-and-conquer).
def main(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start, _in[_min_start:_min_start+l]
e.g. For the array [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6] and a sub array of lenght 3 it would return start position 6 (resulting in the array [3,9,8]).
Three O(n) solutions and a benchmark
Note I'm renaming _in and l to clearer-looking names A and k.
Solution 1: Divide and conquer
Split the array in half. Solve left half and right half recursively. The subarrays not yet considered cross the middle, i.e., they're a suffix of the left part plus a prefix of the right part. Compute k-1 suffix-minima of the left half and k-1 prefix-minima of the right half. That allows you to compute the minimum for each middle-crossing subarray of length k in O(1) time each. The best subarray for the whole array is the best of left-best, right-best and crossing-best.
Runtime is O(n), I believe. As Ellis pointed out, in the recursion the subarray can become smaller than k. Such cases take O(1) time to return the equivalent of "there aren't any k-length subarrays in here". So the time is:
T(n) = { 2 * T(n/2) + O(k) if n >= k
{ O(1) otherwise
For any 0 <= k <= n we have k=nc with 0 <= c <= 1. Then the number of calls is Θ(n1-c) and each call's own work takes Θ(nc) time, for a total of Θ(n) time.
Posted a question about the complexity to be sure.
Python implementation:
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
Solution 2: k-Blocks
As commented by #benrg, the above dividing-and-conquering is needlessly complicated. We can simply work on blocks of length k. Compute the suffix minima of the first block and the prefix minima of the second block. That allows finding the minimum of each k-length subarray within these two blocks in O(1) time. Do the same with the second and third block, the third and fourth block, etc. Time is O(n) as well.
Python implementation:
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
Solution 3: Monoqueue
Not divide & conquer, but first one I came up with (and knew was O(n)).
Sliding window, represent the window with a deque of (sorted) indexes of strictly increasing array values in the window. When sliding the window to include a new value A[i]:
Remove the first index from the deque if the sliding makes it fall out of the window.
Remove indexes whose array values are larger than A[i]. (They can never be the minimum of the window anymore.)
Include the new index i.
The first index still in the deque is the index of the current window's minimum value. Use that to update overall result.
Python implementation:
from collections import deque
A = [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6]
k = 3
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
print(result)
Benchmark
With 4000 numbers from the range 0 to 9999, and k=2000:
80.4 ms 81.4 ms 81.8 ms solve_brute_force
80.2 ms 80.5 ms 80.7 ms solve_original
2.4 ms 2.4 ms 2.4 ms solve_monoqueue
2.4 ms 2.4 ms 2.4 ms solve_divide_and_conquer
1.3 ms 1.4 ms 1.4 ms solve_blocks
Benchmark code (Try it online!):
from timeit import repeat
from random import choices
from itertools import accumulate
from math import inf
from itertools import count
from collections import deque
def solve_monoqueue(A, k):
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
return result
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
def solve_brute_force(A, k):
return max(range(len(A)+1-k),
key=lambda start: min(A[start : start+k]))
def solve_original(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start # , _in[_min_start:_min_start+l]
solutions = [
solve_brute_force,
solve_original,
solve_monoqueue,
solve_divide_and_conquer,
solve_blocks,
]
for _ in range(3):
A = choices(range(10000), k=4000)
k = 2000
# Check correctness
expect = None
for solution in solutions:
index = solution(A.copy(), k)
assert 0 <= index and index + k-1 < len(A)
min_there = min(A[index : index+k])
if expect is None:
expect = min_there
print(expect)
else:
print(min_there == expect, solution.__name__)
print()
# Speed
for solution in solutions:
copy = A.copy()
ts = sorted(repeat(lambda: solution(copy, k), number=1))[:3]
print(*('%5.1f ms ' % (t * 1e3) for t in ts), solution.__name__)
print()

Minimum Sum of Absolute Differences between Two Arrays [duplicate]

I have two sorted lists of numbers A and B with B being at least as long as A. Say:
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
I want to associate each number in A with a different number in B but preserving order. For any such mapping we define the total distance to be the sum of the squared distances between mapped numbers.
For example:
If we map 1.1 to 0 0 then 2.3 can be mapped to any number from 1.9 onwards. But if we had mapped 1.1 to 2.7, then 2.3 could only be mapped to a number in B from 8.4 onwards.
Say we map 1.1->0, 2.3->1.9, 5.6->8.4, 5.7->9.1, 10.1->10.7. This is a valid mapping and has distance (1.1^2+0.4^2+2.8^2+3.4^2+0.6^2).
Another example to show a greedy approach will not work:
A = [1, 2]
B = [0, 1, 10000]
If we map 1->1 then we have to map 2->10000 which is bad.
The task is to find the valid mapping with minimal total distance.
Is hard to do? I am interested in a method that is fast when the lists are of length a few thousand.
And here is a O(n) solution! (This is the original attempt, see below for a fixed version.)
The idea is as follows. We first solve the problem for every other element, turn that into a very close solution, then use dynamic programming to find the real solution. This is solving a problem that is half the size first, followed by O(n) work. Using the fact that x + x/2 + x/4 + ... = 2x this turns out to be O(n) work.
This very, very much requires sorted lists. And doing a band that is 5 across is overkill, it very much looks like a band that is 3 across always gives the right answer, but I wasn't confident enough to go with that.
def improve_matching (list1, list2, matching):
# We do DP forward, trying a band that is 5 across, building up our
# answer as a linked list. If our answer changed by no more than 1
# anywhere, we are done. Else we recursively improve again.
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
best_j = None
best_cost = None
this = {}
for delta in (-2, 2, -1, 1, 0):
j = matching[i] + delta
# Bounds sanity checks.
if j < 0:
continue
elif len(list2) <= j:
continue
j_prev = best_j_last
if j <= j_prev:
if j-1 in last:
j_prev = j-1
else:
# Can't push back this far.
continue
cost = last[j_prev][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[j_prev][1]])
if (best_j is None) or cost <= best_cost:
best_j = j
best_cost = cost
best_j_last = best_j
last = this
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append( linked_list[0])
linked_list = linked_list[1]
matching_new = [x for x in reversed(matching_rev)]
for i in range(len(matching_new)):
if 1 < abs(matching[i] - matching_new[i]):
print "Improving further" # Does this ever happen?
return improve_matching(list1, list2, matching_new)
return matching_new
def match_lists (list1, list2):
if 0 == len(list1):
return []
elif 1 == len(list1):
best_j = 0
best_cost = (list1[0] - list2[0])**2
for j in range(1, len(list2)):
cost = (list1[0] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
return [best_j]
elif 1 < len(list1):
# Solve a smaller problem first.
list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
matching_smaller = match_lists(list1_smaller, list2_smaller)
# Start with that matching.
matching = [None] * len(list1)
for i in range(len(matching_smaller)):
matching[2*i] = 2*matching_smaller[i]
# Fill in the holes between
for i in range(len(matching) - 1):
if matching[i] is None:
best_j = matching[i-1] + 1
best_cost = (list1[i] - list2[best_j])**2
for j in range(best_j+1, matching[i+1]):
cost = (list1[i] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
matching[i] = best_j
# And fill in the last one if needed
if matching[-1] is None:
if matching[-2] + 1 == len(list2):
# This will be an invalid matching, but improve will fix that.
matching[-1] = matching[-2]
else:
best_j = matching[-2] + 1
best_cost = (list1[-2] - list2[best_j])**2
for j in range(best_j+1, len(list2)):
cost = (list1[-1] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
matching[-1] = best_j
# And now improve.
return improve_matching(list1, list2, matching)
def best_matching (list1, list2):
matching = match_lists(list1, list2)
cost = 0.0
result = []
for i in range(len(matching)):
pair = (list1[i], list2[matching[i]])
result.append(pair)
cost = cost + (pair[0] - pair[1])**2
return (cost, result)
UPDATE
There is a bug in the above. It can be demonstrated with match_lists([1, 3], [0, 0, 0, 0, 0, 1, 3]). However the solution below is also O(n) and I believe has no bugs. The difference is that instead of looking for a band of fixed width, I look for a band of width dynamically determined by the previous matching. Since no more than 5 entries can look to match at any given spot, it again winds up O(n) for this array and a geometrically decreasing recursive call. But long stretches of the same value cannot cause a problem.
def match_lists (list1, list2):
prev_matching = []
if 0 == len(list1):
# Trivial match
return prev_matching
elif 1 < len(list1):
# Solve a smaller problem first.
list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
prev_matching = match_lists(list1_smaller, list2_smaller)
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
lowest_j = 0
highest_j = len(list2) - 1
if 3 < i:
lowest_j = 2 * prev_matching[i//2 - 2]
if i + 4 < len(list1):
highest_j = 2 * prev_matching[i//2 + 2]
if best_j_last == highest_j:
# Have to push it back.
best_j_last = best_j_last - 1
best_cost = last[best_j_last][0] + (list1[i] - list2[highest_j])**2
best_j = highest_j
this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}
# Now try the others.
for j in range(lowest_j, highest_j):
prev_j = best_j_last
if j <= prev_j:
prev_j = j - 1
if prev_j not in last:
continue
else:
cost = last[prev_j][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[prev_j][1]])
if cost < best_cost:
best_cost = cost
best_j = j
last = this
best_j_last = best_j
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append( linked_list[0])
linked_list = linked_list[1]
matching_new = [x for x in reversed(matching_rev)]
return matching_new
def best_matching (list1, list2):
matching = match_lists(list1, list2)
cost = 0.0
result = []
for i in range(len(matching)):
pair = (list1[i], list2[matching[i]])
result.append(pair)
cost = cost + (pair[0] - pair[1])**2
return (cost, result)
Note
I was asked to explain why this works.
Here is my heuristic understanding. In the algorithm we solve the half-problem. Then we have to solve the full problem.
The question is how far can an optimal solution for the full problem be forced to be from the optimal solution to the half problem? We push it to the right by having every element in list2 that wasn't in the half problem be large as possible, and every element in list1 that wasn't in the half problem be small as possible. But if we shove the ones from the half problem to the right, and put the duplicate elements where they were then modulo boundary effects, we've got 2 optimal solutions to the half problem and nothing moved by more than to where the next element right was in the half problem. Similar reasoning applies to trying to force the solution left.
Now let's discuss those boundary effects. Those boundary effects are at the end by 1 element. So when we try to shove an element off the end, we can't always. By looking 2 elements instead of 1 over, we add enough wiggle room to account for that as well.
Hence there has to be an optimal solution that is fairly close to the half problem doubled in an obvious way. There may be others, but there is at least one. And the DP step will find it.
I would need to do some work to capture this intuition into a formal proof, but I'm confident that it could be done.
Here's a recursive solution. Pick the middle element of a; map that to each possible element of b (leave enough on each end to accommodate the left and right sections of a). For each such mapping, compute the single-element cost; then recur on each of the left and right fragments of a and b.
Here's the code; I'll leave memoization as an exercise for the student.
test_case = [
[ [1, 2], [0, 1, 10] ],
[ [1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8] ],
]
import math
indent = ""
def best_match(a, b):
"""
Find the best match for elements in a mapping to b, preserving order
"""
global indent
indent += " "
# print(indent, "ENTER", a, b)
best_cost = math.inf
best_map = []
if len(a) == 0:
best_cost = 0
best_map = []
else:
# Match the middle element of `a` to each eligible element of `b`
a_midpt = len(a) // 2
a_elem = a[a_midpt]
l_margin = a_midpt
r_margin = a_midpt + len(b) - len(a)
for b_pos in range(l_margin, r_margin+1):
# For each match ...
b_elem = b[b_pos]
# print(indent, "TRACE", a_elem, b_elem)
# ... compute the element cost ...
mid_cost = (a_elem - b_elem)**2
# ... and recur for similar alignments on left & right list fragments
l_cost, l_map = best_match(a[:l_margin], b[:b_pos])
r_cost, r_map = best_match(a[l_margin+1:], b[b_pos+1:])
# Check total cost against best found; keep the best
cand_cost = l_cost + mid_cost + r_cost
# print(indent, " COST", mid_cost, l_cost, r_cost)
if cand_cost < best_cost:
best_cost = cand_cost
best_map = l_map[:] + [(a_elem, b_elem)]
best_map.extend(r_map[:])
# print(indent, "LEAVE", best_cost, best_map)
return best_cost, best_map
for a, b in test_case:
print('\n', a, b)
print(best_match(a, b))
Output:
a = [1, 2]
b = [0, 1, 10]
2 [(1, 0), (2, 1)]
a = [1.1, 2.3, 5.6, 5.7, 10.1]
b = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
16.709999999999997 [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)]
For giggles and grins, here is what is hopefully a much faster solution than either of the other working ones. The idea is simple. First we do a greedy match left to right. Then a greedy match right to left. This gives us bounds on where each element can go. Then we can do a DP solution left to right only looking at possible values.
If the greedy approaches agree, this will take linear time. If the greedy approaches are very far apart, this can take quadratic time. But the hope is that the greedy approaches produce reasonably close results, resulting in close to linear performance.
def match_lists(list1, list2):
# First we try a greedy matching from left to right.
# This gives us, for each element, the last place it could
# be forced to match. (It could match later, for instance
# in a run of equal values in list2.)
match_last = []
j = 0
for i in range(len(list1)):
while True:
if len(list2) - j <= len(list1) - i:
# We ran out of room.
break
elif abs(list2[j+1] - list1[i]) <= abs(list2[j] - list1[i]):
# Take the better value
j = j + 1
else:
break
match_last.append(j)
j = j + 1
# Next we try a greedy matching from right to left.
# This gives us, for each element, the first place it could be
# forced to match.
# We build it in reverse order, then reverse.
match_first_rev = []
j = len(list2) - 1
for i in range(len(list1) - 1, -1, -1):
while True:
if j <= i:
# We ran out of room
break
elif abs(list2[j-1] - list1[i]) <= abs(list2[j] - list1[i]):
# Take the better value
j = j - 1
else:
break
match_first_rev.append(j)
j = j - 1
match_first = [x for x in reversed(match_first_rev)]
# And now we do DP forward, building up our answer as a linked list.
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
# We initialize with the last position we could choose.
best_j = match_last[i]
best_cost = last[best_j_last][0] + (list1[i] - list2[best_j])**2
this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}
# Now try the rest of the range of possibilities
for j in range(match_first[i], match_last[i]):
j_prev = best_j_last
if j <= j_prev:
j_prev = j - 1 # Push back to the last place we could match
cost = last[j_prev][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[j_prev][1]])
if cost < best_cost:
best_cost = cost
best_j = j
last = this
best_j_last = best_j
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append(
(list1[len(matching_rev)], list2[linked_list[0]]))
linked_list = linked_list[1]
matching = [x for x in reversed(matching_rev)]
return (final_cost, matching)
print(match_lists([1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]))
Python is not very friendly with recursion so attempting to apply it to a list of thousands of elements might not fair so well. Here is a bottom-up approach that takes advantage of the optimal solution for any a from A as we increase the index for its potential partner from B being non-decreasing. (Works for both sorted and non-sorted input.)
def f(A, B):
m = [[(float('inf'), -1) for b in B] for a in A]
for i in xrange(len(A)):
for j in xrange(i, len(B) - len(A) + i + 1):
d = (A[i] - B[j]) ** 2
if i == 0:
if j == i:
m[i][j] = (d, j)
elif d < m[i][j-1][0]:
m[i][j] = (d, j)
else:
m[i][j] = m[i][j-1]
# i > 0
else:
candidate = d + m[i-1][j-1][0]
if j == i:
m[i][j] = (candidate, j)
else:
if candidate < m[i][j-1][0]:
m[i][j] = (candidate, j)
else:
m[i][j] = m[i][j-1]
result = m[len(A)-1][len(B)-1][0]
# Backtrack
lst = [None for a in A]
j = len(B) - 1
for i in xrange(len(A)-1, -1, -1):
j = m[i][j][1]
lst[i] = j
j = j - 1
return (result, [(A[i], B[j]) for i, j in enumerate(lst)])
A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)
Output:
(2, [(1, 0), (2, 1)])
(16.709999999999997, [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)])
Update
Here's an O(|B|) space implementation. I'm not sure if this still offers a way to backtrack to get the mapping but I'm working on it.
def f(A, B):
m = [(float('inf'), -1) for b in B]
m1 = [(float('inf'), -1) for b in B] # m[i-1]
for i in xrange(len(A)):
for j in xrange(i, len(B) - len(A) + i + 1):
d = (A[i] - B[j]) ** 2
if i == 0:
if j == i:
m[j] = (d, j)
elif d < m[j-1][0]:
m[j] = (d, j)
else:
m[j] = m[j-1]
# i > 0
else:
candidate = d + m1[j-1][0]
if j == i:
m[j] = (candidate, j)
else:
if candidate < m[j-1][0]:
m[j] = (candidate, j)
else:
m[j] = m[j-1]
m1 = m
m = m[:len(B) - len(A) + i + 1] + [(float('inf'), -1)] * (len(A) - i - 1)
result = m1[len(B)-1][0]
# Backtrack
# This doesn't work as is
# to get the mapping
lst = [None for a in A]
j = len(B) - 1
for i in xrange(len(A)-1, -1, -1):
j = m1[j][1]
lst[i] = j
j = j - 1
return (result, [(A[i], B[j]) for i, j in enumerate(lst)])
A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)
import random
import time
A = [random.uniform(0, 10000.5) for i in xrange(10000)]
B = [random.uniform(0, 10000.5) for i in xrange(15000)]
start = time.time()
print f(A, B)[0]
end = time.time()
print(end - start)

Rank and unrank combinations to distribute k balls into n bins of different capacities

I want to distribute k balls into n bins of different capacities. How can I rank and unrank the distributions given n, k, and the bin capacities?
Example:
n := 3
k := 4
bin capacities := 3,2,1
Balls in bins:
1,2,1, 2,1,1, 2,2,0, 3,0,1, 3,1,0 := 5
Is there a formula?
I do not know if there is a standard name for this technique, but this is a kind of problem that I have successfully solved many times with a twist on dynamic programming.
What I do using dynamic programming to build a data structure from which the rank/unrank can happen, and then build logic to do the rank/unrank thing.
The dynamic programming piece is hardest.
import collections
BallSolutions = collections.namedtuple('BallSolutions', 'bin count balls next_bin_solutions next_balls_solutions');
def find_ball_solutions (balls, bin_capacities):
# How many balls can fit in the remaining bins?
capacity_sum = [0 for _ in bin_capacities]
capacity_sum[-1] = bin_capacities[-1]
for i in range(len(bin_capacities) - 2, -1, -1):
capacity_sum[i] = capacity_sum[i+1] + bin_capacities[i]
cache = {}
def _search (bin_index, remaining_balls):
if len(bin_capacities) <= bin_index:
return None
elif capacity_sum[bin_index] < remaining_balls:
return None
elif (bin_index, remaining_balls) not in cache:
if bin_index + 1 == len(bin_capacities):
cache[(bin_index, remaining_balls)] = BallSolutions(
bin=bin_index, count=1, balls=remaining_balls, next_bin_solutions=None, next_balls_solutions=None)
else:
this_solution = None
for this_balls in range(min([remaining_balls, bin_capacities[bin_index]]), -1, -1):
next_bin_solutions = _search(bin_index+1, remaining_balls - this_balls)
if next_bin_solutions is None:
break # We already found the fewest balls that can go in this bin.
else:
this_count = next_bin_solutions.count
if this_solution is not None:
this_count = this_count + this_solution.count
next_solution = BallSolutions(
bin=bin_index, count=this_count,
balls=this_balls, next_bin_solutions=next_bin_solutions,
next_balls_solutions=this_solution)
this_solution = next_solution
cache[(bin_index, remaining_balls)] = this_solution
return cache[(bin_index, remaining_balls)]
return _search(0, balls)
Here is code to produce a ranked solution:
def find_ranked_solution (solutions, n):
if solutions is None:
return None
elif n < 0:
return None
elif solutions.next_bin_solutions is None:
if n == 0:
return [solutions.balls]
else:
return None
elif n < solutions.next_bin_solutions.count:
return [solutions.balls] + find_ranked_solution(solutions.next_bin_solutions, n)
else:
return find_ranked_solution(solutions.next_balls_solutions, n - solutions.next_bin_solutions.count)
Here is code to produce the rank for a solution. Note that it will blow up if provided with an invalid answer.
def find_solution_rank (solutions, solution):
n = 0
while solutions.balls < solution[0]:
n = n + solutions.next_bin_solutions.count
solutions = solutions.next_balls_solutions
if 1 < len(solution):
n = n + find_solution_rank(solutions.next_bin_solutions, solution[1:])
return n
And here is some test code:
s = find_ball_solutions(4, [3, 2, 1])
for i in range(6):
r = find_ranked_solution(s, i)
print((i, r, find_solution_rank(s, r)))
You can define the number of such combinations recursively. Given k balls and bin capacities q_1, ..., q_n, for each j between 0 andq_1, place j balls in q_1 and allocate the remaining k-j balls among other bins.
Here is a quick Python implementation:
from functools import lru_cache
#lru_cache(None)
def f(n, *qs):
if not qs:
return 1 if n == 0 else 0
q = qs[0]
return sum(f(n-j, *qs[1:]) for j in range(q+1))
f(4, 3, 2, 1)
# 5
Here's a way (in pseudocode), though it doesn't look very efficient. It would probably be smart to add some short-circuiting in places where the number of balls won't fit in the total remaining capacity. Perhaps some clever caching could help, if a given list of capacities will be used many times.
All numbers are non-negative integers. Function ArrayTail(array a) is the subarray whose elements are all elements of the input array after the first. Function ArrayCon(number head, array a) is the array whose elements are head followed by the elements of a.
function Count(array capacities, number balls) -> number
If balls == 0:
return 1
Else if capacities is empty:
return 0
Else:
Let sum: number
sum <- 0
For b from 0 to max(balls, capacities[0]):
sum <- sum + Count(ArrayTail(capacities), b)
End For
return sum
End If/Else
End function
function Rank(array capacities, array counts) -> number
Precondition: length(capacities) == length(counts)
Precondition: counts[i] <= capacities[i] for all i < length(counts)
If counts is empty:
return 0
Else:
Let total: number
total <- 0
For c in counts:
total <- total + c
End For
Let r: number
r <- Rank(ArrayTail(capacities), ArrayTail(counts))
For b from 0 to (counts[0]-1):
r <- r + Count(ArrayTail(capacities), total - b)
End For
return r
End If/Else
End function
function Unrank(array capacities, number balls, number rank) -> array
Precondition: rank < Count(capacities, balls)
If capacities is empty:
return empty array
Else
Let c0: number
c0 <- 0
Loop until "return":
Let subcount: number
subcount <- Count(ArrayTail(capacities), balls-c0)
If subcount <= rank:
c0 <- c0 + 1
rank <- rank - subcount
Else
return ArrayCon(c0, Unrank(ArrayTail(capacities), balls-c0, rank))
End If/Else
End Loop
End If/Else
End function

Check matches with 3's in a set of 6 numbers across 49 number draws

I am using select within Sidekiq:
require 'set'
require 'benchmark'
all_numbers = (1..49).to_a.combination(6)
needle = [1,2,3,4,5,6].to_set
Benchmark.bm do |x|
x.report { all_numbers.select{|z| (needle & z).count == 3} }
end
# user system total real
# 74.200000 3.040000 77.240000 ( 78.901259)
I want to check thousands of such needles quickly. Is there a different way to find out this information? Is converting to C an option?
Note:
all_numbers is a variable that does not change, and is as above always.
Goal is to display all the sets which have 3 matches.
Examples of needles can be got from:
(1..49).to_a.shuffle.first(6).sort
Assuming all of needle is members of all_numbers:
For three to be correct, that's a 3-combination of 6:
(6*5*4) / (1*2*3) = 20
For the remaining three to be incorrect, that's a 3-combination of the remaining (49-6):
(43*42*41) / (1*2*3) = 12341
Thus, the total number of combinations is
12341 * 20 = 246820
In code:
require 'benchmark'
size_all = 49
size_needle = 6
required = 3
def binomial(n, k)
((n - k + 1)..n).inject(&:*) / (1..k).inject(&:*)
end
Benchmark.bm do |x|
x.report {
binomial(size_needle, required) * binomial(size_all - size_needle, required)
}
end
# user system total real
# 0.000026 0.000006 0.000032 ( 0.000030)
Slightly faster.
EDIT: After the requirements are changed:
class Array
def semimatching_combination(needle, num_total, num_needle)
unless block_given?
return to_enum(__method__, needle, num_total, num_needle) do
binomial(needle.size, num_needle) *
binomial(self.size - needle.size, num_total - num_needle)
end
end
needle.combination(num_needle) do |needle_comb|
(self - needle).combination(num_total - num_needle) do |other_comb|
yield (needle_comb + other_comb).sort
end
end
end
end
(1..49).to_a.semimatching_combination((1..6).to_a, 6, 3).size
# => 246820
(1..49).to_a.semimatching_combination((1..6).to_a, 6, 3).to_a
# => [[1, 2, 3, 7, 8, 9], ...]
You can replace sort with to_set (with require 'set') if you want, depending on what you want to generate.

Divide an array into subarrays as equally as possible for core-mapping

What algorithms are used to map an image array to multiple cores for processing? I've been trying to come up with something that will return a list of (disjoint) ranges over which to iterate in an array, and so far I have the following.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
def divider(arr_dims, coreNum=1):
""" Get a bunch of iterable ranges;
Example input: [[[0, 24], [15, 25]]]
"""
if (coreNum == 1):
return arr_dims
elif (coreNum < 1):
raise ValueError(\
'partitioner expected a positive number of cores, got %d'\
% coreNum
)
elif (coreNum % 2):
raise ValueError(\
'partitioner expected an even number of cores, got %d'\
% coreNum
)
total = []
# Split each coordinate in arr_dims in _half_
for arr_dim in arr_dims:
dY = arr_dim[0][1] - arr_dim[0][0]
dX = arr_dim[1][1] - arr_dim[1][0]
if ((coreNum,)*2 > (dY, dX)):
coreNum = max(dY, dX)
coreNum -= 1 if (coreNum % 2 and coreNum > 1) else 0
new_c1, new_c2, = [], []
if (dY >= dX):
# Subimage height is greater than its width
half = dY // 2
new_c1.append([arr_dim[0][0], arr_dim[0][0] + half])
new_c1.append(arr_dim[1])
new_c2.append([arr_dim[0][0] + half, arr_dim[0][1]])
new_c2.append(arr_dim[1])
else:
# Subimage width is greater than its height
half = dX // 2
new_c1.append(arr_dim[0])
new_c1.append([arr_dim[1][0], half])
new_c2.append(arr_dim[0])
new_c2.append([arr_dim[1][0] + half, arr_dim[1][1]])
total.append(new_c1), total.append(new_c2)
# If the number of cores is 1, we get back the total; Else,
# we split each in total, etc.; it's turtles all the way down
return divider(total, coreNum // 2)
if __name__ == '__main__':
import numpy as np
X = np.random.randn(25 - 1, 36 - 1)
dims = [zip([0, 0], list(X.shape))]
dims = [list(j) for i in dims for j in dims[0] if type(j) != list]
print(divider([dims], 2))
It's incredibly limited, however, because it only accepts a number of cores that's some power of 2, and then I'm certain there's edge cases I'm overlooking. Running it returns [[[0, 24], [0, 17]], [[0, 24], [17, 35]]], and then using pathos I've mapped the first set to one core in my laptop and the second to another.
I guess I just don't know how to geometrically walk my way through partitioning an image into segments that are as similar in size as possible, so that each core on a given machine has the same amount of work to do.
I'm not too sure what you're trying to achieve, but if you want to split an array (of whatever dimensions) into multiple parts you can look into the numpy.array_split method numpy.array_split.
It partitions an array into an almost equal number of parts, so it works even when the number of partitions cannot cleanly divide the array.

Resources