Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Assume we have a set of N items say, S = {t1, t2, t3}. I would like to produce all the possible subsets of S given the restriction that t1 must appear in every set. Therefore, all possible subsets of S are {t1}, {t1,t2}, {t1,t3}, and {t1,t2,t3}. How can I write a recursive function that takes two sets {t1} and {t2,t3} and returns the subsets listed above.
Also if I had 100s of subsets such as S, storage of all subsets becomes a problem. My program goes in iterations and at every iteration I only need to manipulate one subset from each set. Is there I way I can produce the subsets of a set in steps rather than all at once? i.e. every time I call next(S), I get a new subset.
Note I'm coding in C.
Your "restriction" amounts to the following
Remove all the elements from S that must appear in the final sets. Call the remaining set S` and the removed elements B.
Produce the powerset of S` taking the union with B for each item to add in the required elements.
The powerset is a standard recursive function and outside the scope of your question. Indeed, there are many examples of how to do so on Stack Overflow.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Ok let's say that I have an array as 1 4 2 3 1 and I want to split it in 2 sub arrays such that the absolute difference of their sums is minimum.
That is to say as for the above example the 2 sub-arrays would be 4 2 and 3 1 1 which is |6 - 5| that is 1.
It seems to be a dynamic programming question but I'd like to solve it the conventional way.
I am not looking for exact answers but rather the ideology about how should I approach this problem.
Any hints would be appreciated. But just the hints as I'd like to solve it on my own thereafter.
We are not concerned about the order of the elements and their can be duplicate elements as well.
If you just want to find the minimum by any means then the brute force way would be to find all the ways to split the array in two and find the difference of sum of the numbers in each pair. Then store the minimum (and the two sub arrays) or a list of the minimums and replace this if the program finds a new minimum. Iterate over all the possible pairs and there you go your new subarray.
Depending on how you code this there are way to improve on this method. For example if you have the pair [1,2,3] and [4,5] you don't need to also do [4,5] and [1,2,3] as this would be the same pair of subarrays.
I don't want to write an actual way of doing this as you did specify you want to try it yourself
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've seen plenty of questions asking whether quicksort or mergesort is 'better', and when to use each of them, but what I'd like to see is some input on when to use them with regard to the size of the data being sorted.
Let's say I have a number of items, whether they be ints or custom objects. I sort these items.
I see mergesort, in a way, as being the optimal case of quicksort (picking the median as the pivot) at every step, but with some overhead. So at a certain size, when the overhead is negligible in comparison to the consistent optimal nature of mergesort, it would make sense to use it in favor of quicksort.
Radix sort has a 'linear' runtime given that the number of digits of the keys being sorted on does not approach the number of separate items being sorted. However, radix sort also has a relatively large constant on its runtime to my knowledge as well.
If I recall from some testing in the past, it made sense to use mergesort when the number of items being sorted began to number in the millions, and radix in the high millions/billion range.
Am I reasonably accurate in these assessments? Can someone confirm, deny, or correct them to some extent?
(I'm talking about rather 'simple' implementations of each sort. Also, in the case of radix sort, let's say that the largest single key is no larger than twice the number of items being sorted. i.e. sorting 4,000,000 items, the largest possible key is 8,000,000)
edit - I would like some input on the generic number ranges that one of the given sorts is fastest. I provided some in the question, and that may have been a mistake. What I'd like to see in an answer is an opinion on the number ranges. I know quicksort tends to be the default since its usually 'good enough' and doesn't have the space complexity of merge and doesn't come with the worry of malicious data purposefully made with obscenely large keys (radix).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Suppose you have two strings. Each string has lines, seperated by a newline character. Now you want to compare both strings and then find the best method (shortest number of steps) by only adding or deleting lines of one string, to transform the second string in to the first string.
i.e.
string #2:
abc
def
efg
hello
123
and string #1:
abc
def
efg
adc
123
The best (shortest steps) solution to transform string #2 in to string #1 would be:
remove line at line position 3 ('hello')
add 'abc' after line
position 3
How would one write a generic algorithm to find the quickest, least steps, solutions for transforming one string to another, given that you can only add or remove lines?
This is a classic problem.
For a given set of allowed operations the edit distance between two strings is the minimal number of operations required to transform one into the other.
When the set of allowed operations consists of insertion and deletion only, it is known as the longest common subsequence edit distance.
You'll find everything you need to compute this distance in Longest common subsequence problem.
Note that to answer this question fully, one would have to thoroughly cover the huge subject of graph similarity search / graph edit distance, which I will not do here. I will, however, point you in directions where you can study the problem more thoroughly on your own.
... to find the quickest, least steps, solutions for transforming
one string to another ...
This is a quite common problem known as the (minimum) edit distance problem (or, originally, the specific 'The String-to-String Correction problem', by R. Wagner and M. Fischer), which is a non-trivial problem for the optimal (minimum = least steps) edit distance, which is what you ask for in your question.
See e.g.:
https://en.wikipedia.org/wiki/Edit_distance
https://web.stanford.edu/class/cs124/lec/med.pdf
The minimum edit distance problem for string similarity is in itself a subclass of the more general minimum graph edit distance problem, or graph similarity search (since any string or even sequenced object, as you have noted yourself, can be represented as a graph), see e.g. A survey on graph edit distance.
For details regarding this problem here on SO, refer to e.g. Edit Distance Algorithm and Faster edit distance algorithm.
This should get you started.
I'd tag this problem rather as a math problem (algorithmic instructions) rather than language specific problems, unless someone could guide you to an existing language (C) library for solving edit distance problems.
The fastest way would be to remove all sub-strings, then append (not insert) all new sub-strings; and to do "all sub-strings at once" if you can (possibly leading to a destPointer = sourcePointer approach).
The overhead of minimising the amount of sub-strings removed and inserted will be higher than removing and inserting/appending without checking if its necessary. It's like spending $100 to pay a consultant to determine if you should spend $5.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to make an algorithm which will enable the conduct of A/B testing over a variable number of subjects with a variable number of properties per subject.
For example I have 1000 people with the following properties: they come from two departments, some are managers, some are women etc. these properties may increase/decrease according to the situation.
I want to make an algorithm which will split the population in two with the best representation possible in both A and B of all the properties. So i want two groups of 500 people with equal number of both departments in both, equal number of managers and equal number of women. More specifically, I would like to maintain the ratio of each property in both A and B. So if we have 10% managers I want 10% of sample A and Sample B to be managers.
Any pointers on where to begin? I am pretty sure that such an algorithm exists. I have a gut feeling that this may be unsolvable in some cases as there may be an odd number of managers AND women AND Dept. 1.
Make a list of permutations of all a/b variables.
Dept1,Manager,Male
Dept1,Manager,Female
Dept1,Junior,Male
...
Dept2,Junior,Female
Go through all the people and assign them to their respective permutation. Maybe randomise the order of the people first just to be sure there is no bias in the order they are added to each permutation.
Dept1,Manager,Male-> Person1, Person16, Person143...
Dept1,Manager,Female-> Person7, Person10, Person83...
Have a second process that goes through each permutation and assigns half the people to one test group and half to the other. You will need to account for odd numbers of people in the group, but that should be fairly easy to factor in, obviously a larger sample size will reduce the impact of this odd number on the final results.
The algorithm for splitting the groups is simple - take each group of people who have all dimensions in common and assign half to the treatment and half to the control. You don't need to worry about odd numbers of people, whatever statistical test you are using will account for that. If some dimension is so skewed (i.e., there are only 2 females in your entire sample), it may be wise throw the dimension out.
Simple A/B tests usually use a t-test or g-test, but in your case, you'd be better of using an ANOVA to determine the significance of the treatment on each of the individual dimensions.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a map that I want to separately count the patterns of different numbers.
Without VB, I want to be able to create a dynamic counter that will be able to count the patterns of numbers.
For example:
I want to count how many times, even if it overlaps that this pattern occurs in the map
2 2
2 2
Counting I can see the pattern occurs six times but I'm struggling to create a simple array formula that will be able to do so
I've been told of success with and IF function with nested AND functions so I know it can be done without VB.
Use the formula
=COUNTIFS(A1:E15,2,B1:F15,2)
notice how the two areas are adjacent - one column offset from each other.
You can extend this to find two-by-two regions:
=COUNTIFS(A1:E14,2,B1:F14,2,A2:E15,2,B2:F15,2)
just be very careful about how the different ranges are offset.
An alternative way to write this which, I suspect, will be more efficient for large ranges is:
=SUMPRODUCT((A1:E14=2)*(B1:F14=2)*(A2:E15=2)*(B2:F15=2))