How to divide array in to groups with smallest sum difference? - arrays

I have array of integers, I need to sort them in unknown number of groups with minimal difference in sum of each group.
example:
Array: 2, 1, 4, 7, 1, 2, 6, 8
Number of groups = 3
Result:
Group 1 – 8, 2 = 10
Group 2 – 7, 2, 1 = 10
Group 3 – 6, 4, 1 = 11
Is there any alghoritham too solve this problem?
I'm stuck.

Firstly, if the number of groups is 2 this reduces to the subset sum problem variant the partition problem. This proves the problem is NP-hard, so you shouldn't try to find an efficient algorithm.
Given that it will be at least exponential you might as well just generate all permutations and pick the best. I know some people don't like recursion, but it really is useful here for enumerating the group possibilities:
recfunc(array, groups):
if array is empty
return an array containing the element groups
else
groupsList = empty array
foreach aGroup in groups
element = array[0]
groupsList += recfun(array - element, groups where aGroup adds element)
return groupsList
This algorithm will create a list of all possibilities. It is fairly inefficient, but shouldn't be too hard for you to implement. From here just go through the list and calculate if the sum of the groups is the minimum of the list.

Related

Return Array for Maximum Sum

I have trouble to find a solution for the following question.
"Given an integer array A, you partition the array in (contiguous) subarrays of length at most k. After partitioning, each subarray has their values changed to become the maximum value of the subarray.
These subarrays will be used to create a new array in the order when they are partitioned. The sum of the new array should have the maximum value.
Example:
Input: A = [1, 15, 7, 9, 2, 5, 10], k = 3
Output: newArray = [15, 15, 15, 9, 10, 10, 10]
One possible solution is to try all possible partitions and find the max sum. But I am looking for a better solution.
A posible implementation is to create a dictionary that stores the first value in the partition and if the next value is greater than the one stored, get rid of the one in the dictionary until the end of the partition. And repeat this for all partitions.

permutation ranking with DI sequence

I want to rank and unrank through a subset of permutations given by length. The subset is definded as follows:
Example for permutation length 4:
We have the Input the Bitstring length 3 (always permutation length - 1)
010
0 means 2 consecutive elements are Increasing.
1 means 2 consecutive elements are Decreasing.
For this Bitstring exist the subset with following permutations: 1324,1423,2314,2413,3412
The bitstring defined subset of permutations i want to rank and unrank? Is there an algotrithmic way for a given bitstring to do this?
Let me restate the problem that I think you mean.
You have a bit string of length n-1. If its digits are a pattern of increase/decrease, that describes a set of permutations that fit the pattern. That set can be put into ascending order.
You want to be able to solve two problems.
Given a permutation that fits the pattern, say where it is in that order (ie "rank" it)
Given a number, produce the permutation that is at that place in the order (ie "unrank" it)
And ideally you'd like to be able to solve these without having to generate all of the permutations that fit the pattern.
The key to both is the following function:
def count_matching (bitstring, start):
''' Returns how many permutations of 1..(len(bitstring) + 1)
''' match bitstring with starting value start
# some implementation here.
This can be calculated recursively fairly easily. However doing it the naive way generates all permutations. But if we add a caching layer to memoize it, then we store a polynomial amount of data and make a polynomial number of calls to fill it in.
Here is the data you get once it is cached for your example:
{
('010', 1): 2,
('010', 2): 2,
('010', 3): 1,
('010', 4): 0,
('10', 1): 0,
('10', 2): 1,
('10', 3): 1,
('0', 1): 1,
('0', 2): 0,
('', 1): 1
}
Now this seems like a lot of data for a small number of patterns. But for a permutation of length n the number of entries grows like O(n^2) and the number of calls to populate it grows like O(n^3). (Any eagle eyed readers may figure out how to populate it in time O(n^2). I'm going with the simple version.)
With this in hand, we can take a rank and figure out which permutation it must be with the following idea.
Suppose that we want to find the rank 4 permutation. Our starting list of numbers is (1 2 3 4). We can skip over 0 permutations which start with ('010', 1) and the answer will be the second of the 2 with ('010', 2).
Take the second number 2 and our partial permutation is [2, and we have the numbers (1 3 4). We are looking for the 2nd for bitstring '10'. We skip over the 0 permutations which start ('10', 1), the 1 with ('10', 2) and want the first of the 1 with ('10', 3).
Take the third number 4 and our partial permutation is [2, 4, and we have the numbers (1 3). As before we find that we want the first of the 1 with ('0', 1).
Take the first number 1 and our partial permutation is [2, 4, 1 and we have the numbers (3). There aren't a lot of choices.
So we finish and get [2, 4, 1, 3]. Which you can verify is the 4th.
And so we finish with [2, 4, 3, 1].
We can also go the other way. Taking the same permutation, we start with [2, 4, 3, 1] and want its rank.
How many are before it that differ in the first digit? It used the 2nd possible first number. From the entry for ('010', 1) we know there are 2. And the numbers left are 1 3 4.
How many are before it that differ in the second digit? It uses the 3rd possible second number. From the entries for ('10', 1) and ('10', 2) we know there is 1 more in front of it.
We now have the numbers 1 3 left. None came before it in the third digit. And again, none in the last.
With 3 before it, it must have rank 4.
And there you have it. For memoizing one recursive function, you now make finding permutations by rank, or ranking a given permutation straightforward.

Next permutation/ranking with specific strength

I am searching an algorithm which gives me the next permutation with a specific strength.
A permutation of length n is defined with the elements (1,2,3,...n)
What is the strength of a permutation?
The strength of a permutation with length 10 is definded as |a1-a2|+|a2-a3|+...+|a9-a10|+|a10-a1|.
For example:
(1,2,3,4,5,6) has the strength 10
(1,2,6,3,4,5) has the strength 14
Exist there a formula to compute the next permutation of a given strength and length, or its necesary to compute all elements?
Is ranking/unranking of the subsets possible?
The next permutation function should return the next lexicographical permutation within the subset defined by the given strength and length and without compute the intermediate permutations different strengths.
This is a nicely masked problem in combinatorics. First, note that this is a ring of integers; the linear "array" is an implementation choice, rather than part of the strength analysis. Let's look at the second case, given as (1,2,6,3,4,5):
1
5 2
4 6
3
Every element appears in exactly two terms. Thus, we have a simple linear combination of the elements, with coefficients of -2, 0 2. If the element is larger than both neighbors (e.g. 5), the coefficient is 2; if smaller than both neighbors (e.g. 1), it's -2; if between, the two abs operations cancel, and it's 0 (e.g. 4).
Lemma: the strength must be an even number.
Thus, the summation and some transformations can be examined easily enough with simple analysis. The largest number always has a coefficient of +2; the smallest always has a coefficient of -2.
You can find "close relative" permutations by finding interchangeable elements. For instance, you can always interchange the largest two elements (6 and 5) and/or the smallest two elements (1 and 2), without affecting the strength. For instance, 6 and 5 can be interchanged because they're strictly larger than their neighbors:
(6-2) + (6-3) + (5-1) + (5-4) =
(5-2) + (5-3) + (6-1) + (6-4) =
2*6 + 2*5 - 2 - 3 - 1 - 4
1 and 2 can be interchanged, even though they're adjacent, for a similar reason ... except that there are only three terms, one of which involves the pair:
(5-1) + (2-1) + (6-2) =
(5-2) + (2-1) + (6-1) =
5 + 6 - 2*1
Depending on the distribution of the set of numbers, there will likely be more direct ways to construct a ring with a given strength. Since we do not yet have an ordering defined on the permutations, we have no way to determine a "next" one. However, the simple one is to note that rotations and reflections of a given permutation will all have the same strength:
(1,2,6,3,4,5)
(2,6,3,4,5,1)
(6,3,4,5,1,2)
...
(5,4,3,6,2,1)
(4,3,6,2,1,5)
...
Does that get you moving?
Addition w.r.t. OP updates:
There are several trivially strength-invariant swaps available. I've already mentioned the two extreme pairs (6-5) and (1-2). You can also swap adjacent, consecutive numbers: that adds (4-5) and (3-4) in the above example. From simple algebraic properties, you can often identify a 2-element swap or 3-element rotation (respecting an increase in lexicographic position) that generates the next desired permutation. For instance:
(5, 6, 1, 3, 4, 2)
(5, 6, 1, 4, 2, 3) rotate 3, 4, 2
(5, 6, 1, 4, 3, 2) swap 2, 3
However, there are irruptions in the sequence that you'd be hard-pressed to find in this fashion. For instance, making the leap to change the first or second element is not so clean:
(5, 6, 3, 1, 4, 2)
(5, 6, 3, 2, 4, 1) swap 1, 2 -- easy
(6, 1, 2, 4, 5, 3) wholesale rearrangement --
hard to see that this is the next strength=14
I feel that finding these would require a set of algebraic rules that would find the simple moves and eliminate invalid moves (such as generating 563421 before the "wholesale rearrangement" just above). However, following these rules would often take more time than working through all permutations.
I'd love to find that I'm wrong on this last point. :-)

Calling Groups of Elements of Matlab Arrays

I'm dealing with long daily time series in Matlab, running over periods of 30-100+ years. I've been meaning to start looking at it by seasons, roughly approximating that by taking 91-day segments of each year over the time period (with some tbd method of correcting for odd number of days in the year)
Basically, what I want is an array indexing method that allows me to make a new array that takes 91 elements every 365 elements, starting at element 1. I've been looking for some normal array methods (some (:) or other), but I haven't been able to find one. I guess an alternative would be to kind of iterate over 365-day segments 91 times, but that seems needlessly complicated.
Is there a simpler way that I've missed?
Thanks in advance for the help!
So if I understand correctly, you want to extract elements 1-91, 366-457, 731-822, and so on? I'm not sure that there is a way to do this with basic matrix indexing, but you can do the following:
days = 1:365; %Create array ranging from 1 - 365
difference = length(data) - 365; %how much bigger is time series data?
padded = padarray(days, [0, difference], 'circular'); %extend to fit time series
extracted = data(padded <= 91); %get every element in the range 1-91
Basically what I am doing is creating an array that is the same size as your time series data that repeats 1-365 over and over. I then perform logical indexing on data, such that the padded array is less than or equal to 91.
As a more approachable example, consider:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
days = 1:5;
difference = length(x) - 5;
padded = padarray(days, [0, difference], 'circular');
extracted = x(padded <= 2);
padded then is equal to [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] and extracted is going to be [1, 2, 6, 7]

Partition an array of numbers into sets by proximity

Let's say we have an array like
[37, 20, 16, 8, 5, 5, 3, 0]
What algorithm can I use so that I can specify the number of partitions and have the array broken into them.
For 2 partitions, it should be
[37] and [20, 16, 8, 5, 5, 3, 0]
For 3, it should be
[37],[20, 16] and [8, 5, 5, 3, 0]
I am able to break them down by proximity by simply subtracting the element with right and left numbers but that doesn't ensure the correct number of partitions.
Any ideas?
My code is in ruby but any language/algo/pseudo-code will suffice.
Here's the ruby code by Vikram's algorithm
def partition(arr,clusters)
# Return same array if clusters are less than zero or more than array size
return arr if (clusters >= arr.size) || (clusters < 0)
edges = {}
# Get weights of edges
arr.each_with_index do |a,i|
break if i == (arr.length-1)
edges[i] = a - arr[i+1]
end
# Sort edge weights in ascending order
sorted_edges = edges.sort_by{|k,v| v}.collect{|k| k.first}
# Maintain counter for joins happening.
prev_edge = arr.size+1
joins = 0
sorted_edges.each do |edge|
# If join is on right of previous, subtract the number of previous joins that happened on left
if (edge > prev_edge)
edge -= joins
end
joins += 1
# Join the elements on the sides of edge.
arr[edge] = arr[edge,2].flatten
arr.delete_at(edge+1)
prev_edge = edge
# Get out when right clusters are done
break if arr.size == clusters
end
end
(assuming the array is sorted in descending order)
37, 20, 16, 8, 5, 5, 3, 0
Calculate the differences between adjacent numbers:
17, 4, 8, 3, 0, 2, 3
Then sort them in descending order:
17, 8, 4, 3, 3, 2, 0
Then take the first few numbers. For example, for 4 partitions, take 3 numbers:
17, 8, 4
Now look at the original array and find the elements with these given differences (you should attach the index in the original array to each element in the difference array to make this most easy).
17 - difference between 37 and 20
8 - difference between 16 and 8
4 - difference between 20 and 16
Now print the stuff:
37 | 20 | 16 | 8, 5, 5, 3, 0
I think your problem can be solved using k-clustering using kruskal's algorithm . Kruskal algorithm is used to find the clusters such that there is maximum spacing between them.
Algorithm : -
Construct path graph from your data set like following : -
[37, 20, 16, 8, 5, 5, 3, 0]
path graph: - 0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7
then weight for each edge will be difference between their values
edge(0,1) = abs(37-20) = 17
edge(1,2) = abs(20-16) = 4
edge(2,3) = abs(16-8) = 8
edge(3,4) = abs(8-5) = 3
edge(4,5) = abs(5-5) = 0
edge(5,6) = abs(5-3) = 2
edge(6,7) = abs(3-0) = 3
Use kruskal on this graph till there are only k clusters remaining : -
Sort the edges first according to weights in ascending order:-
(4,5),(5,6),(6,7),(3,4),(1,2),(2,3),(0,1)
Use krushkal on it find exactly k = 3 clusters : -
iteration 1 : join (4,5) clusters = 7 clusters: [37,20,16,8,(5,5),3,0]
iteration 2 : join (5,6) clusters = 6 clusters: [37,20,16,8,(5,5,3),0]
iteration 3 : join (6,7) clusters = 5 clusters: [37,20,16,8,(5,5,3,0)]
iteration 4 : join (3,4) clusters = 4 clusters: [37,20,16,(8,5,5,3,0)]
iteration 5 : join (1,2) clusters = 3 clusters: [37,(20,16),(8,5,5,3,0)]
stop as clusters = 3
reconstrusted solution : [(37), (20, 16), (8, 5, 5, 3, 0)] is what
u desired
While #anatolyg's solution may be fine, you should also look at k-means clustering. It's usually done in higher dimensions, but ought to work fine in 1d.
You pick k; your examples are k=2 and k=3. The algorithm seeks to put the inputs into k sets that minimize the sum of distances squared from the set's elements to the centroid (mean position) of the set. This adds a bit of rigor to your rather fuzzy definition of the right result.
While getting an optimal result is NP hard, there is a simple greedy solution.
It's an iteration. Take a guess to get started. Either pick k elements at random to be the initial means or put all the elements randomly into k sets and compute their means. Some care is needed here because each of the k sets must have at least one element.
Additionally, because your integer sets can have repeats, you'll have to ensure the initial k means are distinct. This is easy enough. Just pick from a set that has been "unqualified."
Now iterate. For each element find its closest mean. If it's already in the set corresponding to that mean, leave it there. Else move it. After all elements have been considered, recompute the means. Repeat until no elements need to move.
The Wikipedia page on this is pretty good.

Resources