I'm doing some algorithm practice and came across this.
I have a list/array, something like this: [1, 5, 4, 9, 10, 2, ...]
How would I go about returning finding unique sets of the list that have the same sum? For example, (5, 4) would equal (9) and (1, 5) would equal 6 and so on.
I am familiar with finding all the sets of a list, but the added trick to this is that the sets have to be unique as in if one index was used for one set, the same index cannot be used for the other set.
Any thoughts? Thanks.
Edit:
After thinking about this some more, here is what I have. I make a list of all possible sets, not worrying about unique. Then I get the min and max of the original superset array. I loop through the values from min to max, incrementing by 1 and check each set. Create a hashmap. If the sum of the set is equal to the value we are checking, we add that set to a list. Additionally, we set the value associated key in the hashmap to True if the the value of an index of a set has a key. Keep on checking each set with the hashmap conditional. Then we return the list of list that should only have the unique sets.
Make sense?
make a list of all possible sets - this is of course exponential in time and space. Here is a polynomial time solution (If I understood the problem):
iterate over the list, keeping two pointers - For every time the first one is incremented the second one runs from that index+1 to the end of the list.
For every combination of two elements, if their sum is lower than the required sum, their combined value is pushed into the end of the list(keeping the original indexes that comprised it).
once a combination of elements that equals exactly the sum is found, all the elements in the corresponding indexes are rmoved from the list, and that set is inserted into the solution vector of disjointed sets.
At the end of this iterative process you will have a complete set of disjointed sets - not necessarily the one with the most number of elements or the most number of groups, but one that no other set could be created from the remaining elements in the list.
Related
Given an array of sets find the one that does not belong:
example: [[a,b,c,d], [a,b,f,g], [a,b,h,i], [j,k,l,m]]
output: [j,k,l,m]
We can see above that the first three sets have a common subset [a,b] and the last one does not. Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.
I have tried iterating over the input list and keeping a count for each character (in a hash).
In a second pass, find which set has the smallest total.
In the example above, the last set would have a sum of counts of 4:
j*1 + k*1 + l*1 + m*1.
I'd like to know if there are better ways to do this.
Your description:
find the set that has the least in common with the other sets
Doing this as a general application would require computing similarity with each individual pair of sets; this does not seem to be what you describe algorithmically. Also, it's an annoying O(n^2) algorithm.
I suggest the follow amendment and clarification
find the set that least conforms to the mean of the entire list of sets.
This matches your description much better, and can be done in two simple passes, O(n*m) where you have n sets of size m.
The approach you outlined does the job quite nicely: count the occurrences of each element in all of the sets, O(nm). Then score each set according to the elements it contains, also O(nm). Keep track of which element has the lowest score.
For additional "accuracy", you could sort the scores and look for gaps in the scoring -- this would point out multiple outliers.
If you do this in Python, use the Counter class for your tally.
You shouldn't be looking for the smallest sum of count of elements. It is dependent on the size of the set. But if you substract the size of the set from the sum, it's 0 only if the set is disjoint from all the others. Another option, is to look at the maximum of the count of its elements. If the maximum is one on a set, then they only belong to the set.
There are many functions you can use. As the note states:
Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.
The previous functions are not optimal. A better function would count the number of shared elements. Set the value of an element to 1 if it's in multiple sets and 0 if it appears only once.
Just like any other median selection problem for an unsorted array but with extra restriction. we are required to use a provided subroutine/helper function, Quart(A,p,r), that finds the 1/4th ordered item in a given subarray in linear time. How can we use this helper function to find the median an array?
Further restriction:
1. Your solution must be performed in-place (no new
array can be created). In particular, one alternative solution would be
to extend the array to size m so that all the entries in A[n+1, ... ,m] =
1 and m > 2n. After this, you would be able to solve the median
problem in the original array with just one call to the quartile problem
in the extended array. With further restriction, this is not possible.
2. while running the algorithm you may temporarily change elements in the array, e.g., a SWAP changes elements. But, after the conclusion of your algorithm, all elements in the array must be the same as they were in the beginning (but just as in the randomized selection algorithm taught in class, they may be in a different order than they were originally).
Since you are not allowed to create new arrays, this means that you are only allowed to modify a small (constant) number of items.
Do a pass through the array and find the min and max.
Call Quart to find the quartile value
Iterate through the array and add (max - min) + 1 to all values below the quartile. This will move the bottom quarter of the values to the top
Call Quart again to find the quartile of the new values (which will be the median of the original values)
Iterate through the array and subtract (max - min) + 1 from all values greater than the max to return the array to its original state
You might need some additional rules to handle special cases e.g. if there are multiple values equal to the quartile.
I am trying to find the second lowest cost in this list. Clearly, it is $547, but when I put the formula in: =SMALL(F2:F31, 2) I get $488.00, and I am guessing this is because $488 repeats itself a number of times and so it is the second number in the list of numbers to be the smallest?
What formula should I put in to get the second smallest number, despite repeats?
What is the purpose of this? The end result? Do you seek automation or this is adhoc?
If this is adhoc, you can do:
1. copy column with numbers
2. Paste copied column into new sheet
3. Use Remove Duplicates functionality (Data tab) on this column to remove repetitions
4. Use your formula
Also, you can do this with one formula :
=SMALL(F2:F31, COUNTIF(F2:F31, MIN(F2:F31)) + 1)
As long as you only want the second smallest number, and you only have 2 digits of significant influence, you can do this fairly effectively without creating additional arrays of data, or using Array Formulas, as long as you can re-order from largest to smallest, instead of smallest to largest.
First, find the smallest number, which is simply:
=MIN(F:F)
Then, subtract 1 penny from that amount. We can now use price-is-right-rules searching to find the closest number, utilizing the next-best feature of the MATCH function, as follows:
=INDEX(F:F,MATCH(MIN(F:F)+0.01,F:F,-1))
This will take the smallest number in column F, and add .01 to it. Using -1 as the 3rd argument in the MIN function forces MIN to accept the next best alternative if this amount is not matched exactly.
Only because I figure based on your last post we are headed this way. I would as I said in that post make a unique list of all your states and counties.
Then building on #Andrew formula, which should be the one marked as correct, with COUNTIFS() as the k value in the SMALL():
=SMALL(IF(A2=DATA!A:A,IF(B2=DATA!B:B,DATA!F:F)),COUNTIFS(DATA!A:A,A2,DATA!B:B,B2,DATA!F:F,MIN(IF(A2=DATA!A:A,IF(B2=DATA!B:B,DATA!F:F))))+1)
This will give you a clean list of the second value.
Then to find the Insurance company that goes with the quote use:
=INDEX(DATA!E:E,MATCH(SMALL(IF(A2=DATA!A:A,IF(B2=DATA!B:B,DATA!F:F)),COUNTIFS(DATA!A:A,A2,DATA!B:B,B2,DATA!F:F,MIN(IF(A2=DATA!A:A,IF(B2=DATA!B:B,DATA!F:F))))+1),IF(A2=DATA!A:A,IF(B2=DATA!B:B,DATA!F:F)),0))
Put them in the second row. These are both Array formulas and will need to be confirmed with ctrl-shift-enter. Then they can be copied down as far as needed.
Given an array which contains all the numbers from 1 to 100 and any number from the array(1 to 100) is given.
we need to form a subset with minimum no of elements from the given array,such that we can represent the given number.
We can only add numbers from the subset to form given number if required.
We can not add element twice from the same index of the subset.
Few Examples:
Ex:given an array of elements 1,2,3.
Ans:1,2.
If the given number is 1.We can represent 1 from our subset directly.
If the given number is 2.We can represent 2 from our subset directly.
If the given number is 3.we can represent 3 with 1+2 from our subset.
We can actually duplicates the numbers while forming the subsets like
Ex:given an array of elements 1,2.
Ans:1,2 or 1,1.
The answer cannot be just 1.As we cannot add same 1 twice to form 2.
A friend of mine asked me this question and I am stuck on how to procede..
Any suggestions would greatly help ...
Also I could not come up some decent title for this question..Is this a classical problem?
You can solve this in a greedy manner: Repeatedly choose the first missing value.
It is not clear whether your array contains all the numbers from 1 to 100, or just a subset of them.
It is also not clear whether you have to make all numbers from 1 to the largest number in the array, or just a few target values. I will assume you have to be able to construct all the intermediate values as well.
Assuming all numbers are present
First we must include the number 1.
Next we include the number 2. We can now make everything up to 3.
So next include the number 4. We can now make everything up to 1+2+4=7.
So next include the number 8. We can now make everything up to 1+2+4+8=15.
...
So next include the number 2^k. We can now make everything up to 1+2+...+2^k = 2^(k+1).
So for 100 numbers you will need 1,2,4,8,16,32,64 and will be able to make every number up to 127.
Assuming a subset of numbers are present
Suppose the array might be a subset such as [1,2,3,5,7,9,15]. The same basic approach works, but this time we need to choose the largest number in the array that is less than or equal to the first missing value.
First include 1.
Next include 2, we can now make numbers up to 3.
Next our first missing value is 4, but we don't have a 4 to pick, so instead pick the 3. We can now reach up to 1+2+3=6.
Next our first missing value is 7, so we can pick the 7. Our reach is now 1+2+3+7=13.
Next our missing value is 14, but we don't have a 14, so instead pick the 9, etc.
I have a relatively complex issue, I need an algorithm to find all possible sub arrays from an array that sum up to X, so for the given array:
{2,8,12,45,32,7,6,5}
lets say we need subarrays that sum to 20, some would be:
{8,12} {2,7,6,5} {12,6,2}
however there will be combinations like:
{7,7,6} {5,5,5,5} {8,8,2,2}
I will need all possible sums.
I have done a solution doing brute force checking of all possibilities however it takes way too long (in some cases in excess of 30 minutes) to complete, so I do need a smarter solution that I've been bumping my head over for a couple of days now.
Your question seems to indicate that answers which repeat numbers are acceptable, and you don't want to generate all possible ways the summands can be ordered. I'll base my answer on that.
I'd implement this in C++. As data structure, I'd probably use something like this:
struct partial_sum {
int min_last_summand;
std::vector< std::pair<partial_sum*, int> > prefixes;
};
std::map<int, partial_sum*> m;
The central piece here is the map m. It maps the value of a sum to some information about how to obtain it. You'd initialize it with 0 mapped to NULL. The prefixes member would store data about all possible ways to obtain a given sum. The first part of each pair gives a pointer to information about all summands except the last, while the second part gives that last member. This gives you a form of directed acyclic graph, as sums can be prefixes of many sums, and sums can have many different prefixes, but the value of every prefix sum is smaller than that of the current sum.
The central iteration step would remove the minimal elkement from m, and generate all possible ways you can add an element from your input set to the value you just removed. So you'd check the map whether you need to insert a new entry for the new sum. And for existing and new entries alike, you create a new item in the prefixes list, with the pointer you just removed from the map as first part, and the last summand you added as the second.
I'd only generate sums in ascending (or rather non-descending) order of summands, to avoid generating all permutations. To make things easier, I'd maintain this min_last_summand information. It should always contain the minimum of all the second elements from the pairs in the prefixes list. When generating new sums, you can skip those where the last summand would be less than the minimal last summand of the prefix, as that would imply a summand being smaller than its predecessor. You could also avoid generating sums where the total value is greater than your target sum.
When printing the results, you'll have to recurse over the part of the DAG reachable from your target sum, and list all paths from there to the root NULL. So in each recursion step, you'd have a pointer to the current partial sum. If that pointer is NULL, you emit a sum consisting of zero summands. Otherwise, you iterate over all prefixes. For each prefix, you recurse to generate all possible ways to write that prefix, but only if the min_last_summand of the first element is no greater than the current last summand, and also only if the second element is no greater than the summand that will follow it. Which means that you'll have to pass that following summand as an argument to your recursive calls. Taken together, this avoids generating sums with descending steps in them.
The approach above assumes that your program will terminate after one run, so you don't have to worry about freeing memory. If you do, you'll probably have to store pointers to all the objects you created, so you can free them all.