Related
Given an array A consisting of N elements. Our task is to find the maximal subarray sum after applying the following operation exactly once:
. Select any subarray and set all the elements in it to zero.
Eg:- array is -1 4 -1 2 then answer is 6 because we can choose -1 at index 2 as a subarray and make it 0. So the resultatnt array will be after applying the operation is : -1 4 0 2. Max sum subarray is 4+0+2 = 6.
My approach was to find start and end indexes of minimum sum subarray and make all elements as 0 of that subarray and after that find maximum sum subarray. But this approach is wrong.
Starting simple:
First, let us start with the part of the question: Finding the maximal subarray sum.
This can be done via dynamic programming:
a = [1, 2, 3, -2, 1, -6, 3, 2, -4, 1, 2, 3]
a = [-1, -1, 1, 2, 3, 4, -6, 1, 2, 3, 4]
def compute_max_sums(a):
res = []
currentSum = 0
for x in a:
if currentSum > 0:
res.append(x + currentSum)
currentSum += x
else:
res.append(x)
currentSum = x
return res
res = compute_max_sums(a)
print(res)
print(max(res))
Quick explanation: we iterate through the array. As long as the sum is non-negative, it is worth appending the whole block to the next number. If we dip below zero at any point, we discard whole "tail" sequence since it will not be profitable to keep it anymore and we start anew. At the end, we have an array, where j-th element is the maximal sum of a subarray i:j where 0 <= i <= j.
Rest is just the question of finding the maximal value in the array.
Back to the original question
Now that we solved the simplified version, it is time to look further. We can now select a subarray to be deleted to increase the maximal sum. The naive solution would be to try every possible subarray and to repeat the steps above. This would unfortunately take too long1. Fortunately, there is a way around this: we can think of the zeroes as a bridge between two maxima.
There is one more thing to address though - currently, when we have the j-th element, we only know that the tail is somewhere behind it so if we were to take maximum and 2nd biggest element from the array, it could happen that they would overlap which would be a problem since we would be counting some of the elements more than once.
Overlapping tails
How to mitigate this "overlapping tails" issue?
The solution is to compute everything once more, this time from the end to start. This gives us two arrays - one where j-th element has its tail i pointing towards the left end of the array(e.g. i <=j) and the other where the reverse is true. Now, if we take x from first array and y from second array we know that if index(x) < index(y) then their respective subarrays are non-overlapping.
We can now proceed to try every suitable x, y pair - there is O(n2) of them. However since we don't need any further computation as we already precomputed the values, this is the final complexity of the algorithm since the preparation cost us only O(n) and thus it doesn't impose any additional penalty.
Here be dragons
So far the stuff we did was rather straightforward. This following section is not that complex but there are going to be some moving parts. Time to brush up the max heaps:
Accessing the max is in constant time
Deleting any element is O(log(n)) if we have a reference to that element. (We can't find the element in O(log(n)). However if we know where it is, we can swap it with the last element of the heap, delete it, and bubble down the swapped element in O(log(n)).
Adding any element into the heap is O(log(n)) as well.
Building a heap can be done in O(n)
That being said, since we need to go from start to the end, we can build two heaps, one for each of our pre-computed arrays.
We will also need a helper array that will give us quick index -> element-in-heap access to get the delete in log(n).
The first heap will start empty - we are at the start of the array, the second one will start full - we have the whole array ready.
Now we can iterate over whole array. In each step i we:
Compare the max(heap1) + max(heap2) with our current best result to get the current maximum. O(1)
Add the i-th element from the first array into the first heap - O(log(n))
Remove the i-th indexed element from the second heap(this is why we have to keep the references in a helper array) - O(log(n))
The resulting complexity is O(n * log(n)).
Update:
Just a quick illustration of the O(n2) solution since OP nicely and politely asked. Man oh man, I'm not your bro.
Note 1: Getting the solution won't help you as much as figuring out the solution on your own.
Note 2: The fact that the following code gives the correct answer is not a proof of its correctness. While I'm fairly certain that my solution should work it is definitely worth looking into why it works(if it works) than looking at one example of it working.
input = [100, -50, -500, 2, 8, 13, -160, 5, -7, 100]
reverse_input = [x for x in reversed(input)]
max_sums = compute_max_sums(input)
rev_max_sums = [x for x in reversed(compute_max_sums(reverse_input))]
print(max_sums)
print(rev_max_sums)
current_max = 0
for i in range(len(max_sums)):
if i < len(max_sums) - 1:
for j in range(i + 1, len(rev_max_sums)):
if max_sums[i] + rev_max_sums[j] > current_max:
current_max = max_sums[i] + rev_max_sums[j]
print(current_max)
1 There are n possible beginnings, n possible ends and the complexity of the code we have is O(n) resulting in a complexity of O(n3). Not the end of the world, however it's not nice either.
I have a 2d array, matrix of a sort (m x n). I need to generate '1' in k cells, but the probability of it should be equal for each cell.
for example, if k=3, we pick randomly where to place the 3 '1's :
[0, 0, 0, 0]
[0, 1, 1, 0]
[1, 0, 0, 0]
At first, I tackled this by generating a Random of modulu m * n (rows * columns).
But, that means that we could theoretically get to the end of the matrix without generating a single '1'.
Then, I read about Yates Shuffle, but wasn't sure whether that's wise and even feasible to implement it with that.
What is an efficient way to implement this?
This is essentially a problem of "sampling without replacement": Out of mn cells, choose k of them without replacement. There are many approaches to this problem, depending on how big mn is in relation to k. If mn is relatively small, then a Fisher–Yates shuffle will work well; make a list of cells, shuffle them, then take the first k cells to set to 1.
For more details, see my sections on sampling without replacement and shuffling. (Part of my comment moved here:) Different sampling algorithms have different tradeoffs in terms of time and space. For example, a Fisher–Yates shuffle has time and space complexity of O(mn), while a partial shuffle can have time complexity O(k).
My mind is blown since I began, last week, trying to sort an array of N elements by condition: the difference between 2 elements being always less or equal to the next 2 elements. For example:
Α[4] = { 10, 2, 7, 4}
It is possible to rearrange that array this way:
{2, 7, 10, 4} because (2 - 7 = -5) < (7 - 10 = -3) < (10 - 4 = 6)
{4, 10, 7, 2} because (4 - 10 = -6) < (10 - 7 = 3) < (7 - 2 = 5)
One solution I considered was just shuffling the array and checking each time if it agreed with the conditions, an efficient method for a small number of elements, but time consuming or even impossible for a larger number of elements.
Another was trying to move elements around the array with loops, hoping again to meet the requirements, but again this method is very time consuming and also sometimes not possible.
Trying to find an algorithm doesn't seem to have any result but there must be something.
Thank you very much in advance.
I normally don't just provide code, but this question intrigued me, so here's a brute-force solution, that might get you started.
The concept will always be slow because the individual elements in the list to be sorted are not independent of each other, so they cannot be sorted using traditional O(N log N) algorithms. However, the differences can be sorted that way, which simplifies checking for a solution, and permutations could be checked in parallel to speed up the processing.
import os,sys
import itertools
def is_diff_sorted(qa):
diffs = [qa[i] - qa[i+1] for i in range(len(qa)-1)]
for i in range(len(diffs)-1):
if diffs[i] > diffs[i+1]:
return False
return True
a = [2,4,7,10]
#a = [1,4,6,7,20]
a.sort()
for perm in itertools.permutations(a):
if is_diff_sorted(perm):
print "Solution:",str(a)
break
This condition is related to differentiation. The (negative) difference between neighbouring elements has to be steady or increasing with increasing index. Multiply the condition by -1 and you get
a[i+1]-a[i] => a[i+2]-a[i+1]
or
0 => (a[i+2]-a[i+1])- (a[i+1]-a[i])
So the 2nd derivative has to be 0 or negative, which is the same as having the first derivative stay the same or changing downwards, like e.g. portions of the upper half of a circle. That does not means that the first derivative itself has to start out positive or negative, just that it never change upward.
The problem algorithmically is that it can't be a simple sort, since you never compare just 2 elements of the list, you'll have to compare three at a time (i,i+1,i+2).
So the only thing you know apart from random permutations is given in Klas` answer (values first rising if at all, then falling if at all), but his is not a sufficient condition since you can have a positive 2nd derivative in his two sets (rising/falling).
So is there a solution much faster than the random shuffle? I can only think of the following argument (similar to Klas' answer). For a given vector the solution is more likely if you separate the data into a 1st segment that is rising or steady (not falling) and a 2nd that is falling or steady (not rising) and neither is empty. Likely an argument could be made that the two segments should have approximately equal size. The rising segment should have the data that are closer together and the falling segment should contain data that are further apart. So one could start with the mean, and look for data that are close to it, move them to the first set,then look for more widely spaced data and move them to the 2nd set. So a histogram might help.
[4 7 10 2] --> diff [ 3 3 -8] --> 2diff [ 0 -11]
Here is a solution based on backtracking algorithm.
Sort input array in non-increasing order.
Start dividing the array's values into two subsets: put the largest element to both subsets (this would be the "middle" element), then place second largest one into arbitrary subset.
Sequentially put the remaining elements to either subset. If this cannot be done without violating the "difference" condition, use other subset. If both subsets are not acceptable, rollback and change preceding decisions.
Reverse one of the arrays produced on step 3 and concatenate it with other array.
Below is Python implementation (it is not perfect, the worst defect is recursive implementation: while recursion is quite common for backtracking algorithms, this particular algorithm seems to work in linear time, and recursion is not good for very large input arrays).
def is_concave_end(a, x):
return a[-2] - a[-1] <= a[-1] - x
def append_element(sa, halves, labels, which, x):
labels.append(which)
halves[which].append(x)
if len(labels) == len(sa) or split_to_halves(sa, halves, labels):
return True
if which == 1 or not is_concave_end(halves[1], halves[0][-1]):
halves[which].pop()
labels.pop()
return False
labels[-1] = 1
halves[1].append(halves[0][-1])
halves[0].pop()
if split_to_halves(sa, halves, labels):
return True
halves[1].pop()
labels.pop()
def split_to_halves(sa, halves, labels):
x = sa[len(labels)]
if len(halves[0]) < 2 or is_concave_end(halves[0], x):
return append_element(sa, halves, labels, 0, x)
if is_concave_end(halves[1], x):
return append_element(sa, halves, labels, 1, x)
def make_concave(a):
sa = sorted(a, reverse = True)
halves = [[sa[0]], [sa[0], sa[1]]]
labels = [0, 1]
if split_to_halves(sa, halves, labels):
return list(reversed(halves[1][1:])) + halves[0]
print make_concave([10, 2, 7, 4])
It is not easy to produce a good data set to test this algorithm: plain set of random numbers either is too simple for this algorithm or does not have any solutions. Here I tried to generate a set that is "difficult enough" by mixing together two sorted lists, each satisfying the "difference" condition. Still this data set is processed in linear time. And I have no idea how to prepare any data set that would demonstrate more-than-linear time complexity of this algorithm...
Not that since the diffence should be ever-rising, any solution will have element first in rising order and then in falling order. The length of either of the two "suborders" may be 0, so a solution could consist of a strictly rising or strictly falling sequence.
The following algorithm will find any solutions:
Divide the set into two sets, A and B. Empty sets are allowed.
Sort A in rising order and B in falling order.
Concatenate the two sorted sets: AB
Check if you have a solution.
Do this for all possible divisions into A and B.
Expanding on the #roadrunner66 analysis, the solution is to take two smallest elements of the original array, and make them first and last in the target array; take two next smallest elements and make them second and next-to-last; keep going until all the elements are placed into the target. Notice that which one goes to the left, and which one to the right doesn't matter.
Sorting the original array facilitates the process (finding smallest elements becomes trivial), so the time complexity is O(n log n). The space complexity is O(n), because it requires a target array. I don't know off-hand if it is possible to do it in-place.
An exercise in Coderbyte is supposed to determine if some subset of integers in an array sum to the largest number in the array.
The following code seems to work on my computer, but when I submit it online, it seems to cause an endless loop. (There's never any output, regardless of the argument passed).
def arr_add?(arr)
a = arr.sort
lgst = a.pop
size = a.size
result = false
while size > 1
a.combination(size) {|c| result |= (c.inject {|r, a| r + a} == lgst)}
size -= 1
end
result.to_s
end
arr_add?([1, 2, 3, 4, 10, 14])
Any ideas why this might be the case?
I suspect that you are actually not running into an endless loop, but rather just take a really long time, because of the inefficiency of your algorithm.
def ArrayAdditionI(arr)
arr_size = arr.size
ary = arr.sort
largest = ary.pop
ary_size = arr_size - 1
combination_size = ary_size
result = false
while combination_size > 1
ary.combination(combination_size) {|combination|
result |= (combination.inject(
:+
) == largest)
}
combination_size -= 1
end
result.to_s
end
I introduced a new variable and renamed some others, so that it becomes easier to talk about the algorithm. I also reformatted it, to make the three nested "loops" more obvious.
Let's take a look at the algorithm.
The outer while loop is executed ary_size - 1 == arr_size - 2 times, with combination_size ranging from 2 to ary_size == arr_size - 1.
The combination "loop" is executed ary_size choose combination_size times, that's … well, a very quickly growing number.
The innermost "loop" (the operation performed by combination.inject) is executed combination_size - 1 times.
This gives a total execution count for the innermost operation of:
The Sum from 2 to arr_size - 1 of
arr_size - 1 choose combination_size times
combination_size - 1
In Wolfram Language, that's Sum[Binomial[a-1, c]*(c-1), c, 2, a-1], which Wolfram Alpha tells us is 2^(a-2) (a-3)+1, which is in O(2^n).
Playing around with the numbers a bit:
for 10 items, we have 1793 executions of the inject operation
for 15 items, we already have 98 305
for 20 items, we have 4 456 449
at 28 items, we cross the threshold to a billion operations: 1 677 721 601
for 1000 items, which I suspect is a somewhat reasonable input size CoderBytes might use, we have 2 670 735 203 411 771 297 463 949 434 782 054 512 824 301 493 176 042 516 553 547 843 013 099 994 928 903 285 314 296 959 198 121 926 383 029 722 247 001 218 461 778 959 624 588 092 753 669 155 960 493 619 769 880 691 017 874 939 573 116 202 845 311 796 007 113 080 079 901 646 833 889 657 798 860 899 142 814 122 011 828 559 707 931 456 870 722 063 370 635 289 362 135 539 416 628 419 173 512 766 291 969 operations. Oops.
Try your algorithm with arrays of length 5, 10, 15 (all instantaneous), 20 (a noticeable pause), and then 23, 24, 25 to get a feel for just how quickly the runtime grows.
Assuming that you could build a CPU which can execute the inner loop in a single instruction. Further assuming that a single instruction takes only a Planck time unit (i.e. the CPU has a frequency of roughly 20 000 000 000 000 000 000 000 000 000 000 THz). Further assuming that every single particle in the observable universe was such a CPU. It will still take more than the current age of the universe to execute your algorithm for an array of not even 500 items.
Note that with most of these programming puzzles, they are not actually programming puzzles, they are mathematics puzzles. They usually require a mathematical insight, in order to be able to solve them efficiently. Or, in this case, recognizing that it is the Subset sum problem, which is known to be NP-complete.
By the way, as a matter of style, here is (a slight variation of) your algorithm written in idiomatic Ruby style. As you can see, in idiomatic Ruby, it almost becomes a 1:1 translation of the English problem statement into code.
While it is asymptotically just as inefficient as your algorithm, it breaks early, as soon as the answer is true (unlike yours, will just keep running even if it already found a solution). (any? will do that for you automatically.)
def ArrayAdditionI(arr)
largest = arr.delete_at(arr.index(arr.max))
1.upto(arr.size).any? {|combination_size|
arr.combination(combination_size).any? {|combination|
combination.inject(:+) == largest
}
}.to_s
end
This is an alternative interpretation of the (unclear) problem statement:
def ArrayAdditionI(arr)
2.upto(arr.size).any? {|combination_size|
arr.combination(combination_size).any? {|combination|
combination.inject(:+) == arr.max
}
}.to_s
end
The code above is valid ruby code.
The result is "true".
It is a bit unusual perhaps in that while loops are somewhat rare to see/have, but since it is valid ruby code, it should work on that remote site too.
Contact whoever runs the online ruby interpreter at Coderbyte - their version appears to be incompatible with the MRI ruby.
Your code seems to count down; perhaps have a look at 10.downto(1) - replace with variables as appropriate.
I have for example 5 arrays with some inserted elements (numbers):
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
I need to find most common elements in those arrays and every element should go all the way till the end (see example below). In this example that would be the bold combination (or the same one but with "30" on the end, it's the "same") because it contains the smallest number of different elements (only two, 4 and 2/30).
This combination (see below) isn't good because if I have for ex. "4" it must "go" till it ends (next array mustn't contain "4" at all). So combination must go all the way till the end.
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
EDIT2: OR
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
OR anything else is NOT good.
Is there some algorithm to speed this thing up (if I have thousands of arrays with hundreds of elements in each one)?
To make it clear - solution must contain lowest number of different elements and the groups (of the same numbers) must be grouped from first - larger ones to the last - smallest ones. So in upper example 4,4,4,2 is better then 4,2,2,2 because in first example group of 4's is larger than group of 2's.
EDIT: To be more specific. Solution must contain the smallest number of different elements and those elements must be grouped from first to last. So if I have three arrrays like
1,2,3
1,4,5
4,5,6
Solution is 1,1,4 or 1,1,5 or 1,1,6 NOT 2,5,5 because 1's have larger group (two of them) than 2's (only one).
Thanks.
EDIT3: I can't be more specific :(
EDIT4: #spintheblack 1,1,1,2,4 is the correct solution because number used first time (let's say at position 1) can't be used later (except it's in the SAME group of 1's). I would say that grouping has the "priority"? Also, I didn't mention it (sorry about that) but the numbers in arrays are NOT sorted in any way, I typed it that way in this post because it was easier for me to follow.
Here is the approach you want to take, if arrays is an array that contains each individual array.
Starting at i = 0
current = arrays[i]
Loop i from i+1 to len(arrays)-1
new = current & arrays[i] (set intersection, finds common elements)
If there are any elements in new, do step 6, otherwise skip to 7
current = new, return to step 3 (continue loop)
print or yield an element from current, current = arrays[i], return to step 3 (continue loop)
Here is a Python implementation:
def mce(arrays):
count = 1
current = set(arrays[0])
for i in range(1, len(arrays)):
new = current & set(arrays[i])
if new:
count += 1
current = new
else:
print " ".join([str(current.pop())] * count),
count = 1
current = set(arrays[i])
print " ".join([str(current.pop())] * count)
>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2
If all are number lists, and are all sorted, then,
Convert to array of bitmaps.
Keep 'AND'ing the bitmaps till you hit zero. The position of the 1 in the previous value indicates the first element.
Restart step 2 from the next element
This has now turned into a graphing problem with a twist.
The problem is a directed acyclic graph of connections between stops, and the goal is to minimize the number of lines switches when riding on a train/tram.
ie. this list of sets:
1,4,8,10 <-- stop A
1,2,3,4,11,15 <-- stop B
2,4,20,21 <-- stop C
2,30 <-- stop D, destination
He needs to pick lines that are available at his exit stop, and his arrival stop, so for instance, he can't pick 10 from stop A, because 10 does not go to stop B.
So, this is the set of available lines and the stops they stop on:
A B C D
line 1 -----X-----X-----------------
line 2 -----------X-----X-----X-----
line 3 -----------X-----------------
line 4 -----X-----X-----X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
If we consider that a line under consideration must go between at least 2 consecutive stops, let me highlight the possible choices of lines with equal signs:
A B C D
line 1 -----X=====X-----------------
line 2 -----------X=====X=====X-----
line 3 -----------X-----------------
line 4 -----X=====X=====X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
He then needs to pick a way that transports him from A to D, with the minimal number of line switches.
Since he explained that he wants the longest rides first, the following sequence seems the best solution:
take line 4 from stop A to stop C, then switch to line 2 from C to D
Code example:
stops = [
[1, 4, 8, 10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30],
]
def calculate_possible_exit_lines(stops):
"""
only return lines that are available at both exit
and arrival stops, discard the rest.
"""
result = []
for index in range(0, len(stops) - 1):
lines = []
for value in stops[index]:
if value in stops[index + 1]:
lines.append(value)
result.append(lines)
return result
def all_combinations(lines):
"""
produce all combinations which travel from one end
of the journey to the other, across available lines.
"""
if not lines:
yield []
else:
for line in lines[0]:
for rest_combination in all_combinations(lines[1:]):
yield [line] + rest_combination
def reduce(combination):
"""
reduce a combination by returning the number of
times each value appear consecutively, ie.
[1,1,4,4,3] would return [2,2,1] since
the 1's appear twice, the 4's appear twice, and
the 3 only appear once.
"""
result = []
while combination:
count = 1
value = combination[0]
combination = combination[1:]
while combination and combination[0] == value:
combination = combination[1:]
count += 1
result.append(count)
return tuple(result)
def calculate_best_choice(lines):
"""
find the best choice by reducing each available
combination down to the number of stops you can
sit on a single line before having to switch,
and then picking the one that has the most stops
first, and then so on.
"""
available = []
for combination in all_combinations(lines):
count_stops = reduce(combination)
available.append((count_stops, combination))
available = [k for k in reversed(sorted(available))]
return available[0][1]
possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))
This code prints:
possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]
Since, as I said, I list lines between stops, and the above solution can either count as lines you have to exit from each stop or lines you have to arrive on into the next stop.
So the route is:
Hop onto line 4 at stop A and ride on that to stop B, then to stop C
Hop onto line 2 at stop C and ride on that to stop D
There are probably edge-cases here that the above code doesn't work for.
However, I'm not bothering more with this question. The OP has demonstrated a complete incapability in communicating his question in a clear and concise manner, and I fear that any corrections to the above text and/or code to accommodate the latest comments will only provoke more comments, which leads to yet another version of the question, and so on ad infinitum. The OP has gone to extraordinary lengths to avoid answering direct questions or to explain the problem.
I am assuming that "distinct elements" do not have to actually be distinct, they can repeat in the final solution. That is if presented with [1], [2], [1] that the obvious answer [1, 2, 1] is allowed. But we'd count this as having 3 distinct elements.
If so, then here is a Python solution:
def find_best_run (first_array, *argv):
# initialize data structures.
this_array_best_run = {}
for x in first_array:
this_array_best_run[x] = (1, (1,), (x,))
for this_array in argv:
# find the best runs ending at each value in this_array
last_array_best_run = this_array_best_run
this_array_best_run = {}
for x in this_array:
for (y, pattern) in last_array_best_run.iteritems():
(distinct_count, lengths, elements) = pattern
if x == y:
lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
else :
distinct_count += 1
lengths = tuple(lengths + (1,))
elements = tuple(elements + (x,))
if x not in this_array_best_run:
this_array_best_run[x] = (distinct_count, lengths, elements)
else:
(prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
if distinct_count < prev_count or prev_lengths < lengths:
this_array_best_run[x] = (distinct_count, lengths, elements)
# find the best overall run
best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
if distinct_count < best_count:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
elif distinct_count == best_count and best_lengths < lengths:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
# convert it into a more normal representation.
answer = []
for (length, element) in zip(best_lengths, elements):
answer.extend([element] * length)
return answer
# example
print find_best_run(
[1,4,8,10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30]) # prints [4, 4, 4, 30]
Here is an explanation. The ...this_run dictionaries have keys which are elements in the current array, and they have values which are tuples (distinct_count, lengths, elements). We are trying to minimize distinct_count, then maximize lengths (lengths is a tuple, so this will prefer the element with the largest value in the first spot) and are tracking elements for the end. At each step I construct all possible runs which are a combination of a run up to the previous array with this element next in sequence, and find which ones are best to the current. When I get to the end I pick the best possible overall run, then turn it into a conventional representation and return it.
If you have N arrays of length M, this should take O(N*M*M) time to run.
I'm going to take a crack here based on the comments, please feel free to comment further to clarify.
We have N arrays and we are trying to find the 'most common' value over all arrays when one value is picked from each array. There are several constraints 1) We want the smallest number of distinct values 2) The most common is the maximal grouping of similar letters (changing from above for clarity). Thus, 4 t's and 1 p beats 3 x's 2 y's
I don't think either problem can be solved greedily - here's a counterexample [[1,4],[1,2],[1,2],[2],[3,4]] - a greedy algorithm would pick [1,1,1,2,4] (3 distinct numbers) [4,2,2,2,4] (two distinct numbers)
This looks like a bipartite matching problem, but I'm still coming up with the formulation..
EDIT : ignore; This is a different problem, but if anyone can figure it out, I'd be really interested
EDIT 2 : For anyone that's interested, the problem that I misinterpreted can be formulated as an instance of the Hitting Set problem, see http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover. Basically the left hand side of the bipartite graph would be the arrays and the right hand side would be the numbers, edges would be drawn between arrays that contain each number. Unfortunately, this is NP complete, but the greedy solutions described above are essentially the best approximation.