Find all "and" combinations from multiple sets - arrays

Let's say I have x sets of objects, and each set has a certain number objects. I want create an array which will store all the unique "and" combinations of these objects.
For example, if I have 5 objects in set A, 10 objects in set B, and 8 objects in set C, then I know that there are 5*10*8 = 400 unique ways of picking one object from each set. But I want to actually store these combinations in an array.
So the array would be multidimensional, something like:
{
{ a, a, a }
{ a, a, b }
{ a, a, c }
...
{ a, b, a }
{ a, b, b }
and so on...
}
I need the solution to as efficient as possible, because I am dealing with situations where there are potentially tens of millions of combinations. I am not exactly sure how to begin to approach this problem.
Sorry if it's not clear, but I don't really know what to call what I am trying to achieve, so I am just describing it as best I can. Thank you for any help you can provide.
Edit: Here is some more information about the problem:
The purpose of this problem is that I am going to compute a "score" value from each resulting array. Then, I want to find the top n scores and return them to the user. So actually, I believe that I wouldn't need to have the entire array in memory. I can just iterate through the array, calculate the score, and add it to the returned array if its score is high enough. That way, I only need the top n objects in memory continuously.
I hope this makes things more clear.

Quick python, probably can't get much more efficient, since you need to iterate at some point...
getItems(A, B, C):
for a in A:
for b in B:
for c in C:
items = (a, b, c) ## or [a, b, c], as desired
yield items
Or, if you're familiar with generator expressions:
gen = ((a, b, c) for a in A for b in B for c in C)
Then to use:
for combo in getItems(A, B, C): ## or for combo in gen:
## do stuff here
Edit:
def getItems(*allSets):
if len(allSets) == 0:
yield []
return
thisSet, theRest = allSets[0], allSets[1:]
for value in thisSet:
for values in getItems(*theRest):
yield [value] + values

Do you know the number of sets at design-time? If so, I would do nested for loops. If you don't know the number of sets, then you're potentially do some form of recursion to handle the looping.
With that said, I think what you're doing is, by definition, NOT efficient. Is there a reason you need to store all the possible combinations in memory, rather than generating them as needed on the fly?

Related

Intersection of two unsorted integer arrays

Given two arrays, A and B, both of size (m). The numbers in the arrays are in range of [-n,n]. I need to find an algorithm that returns the intersection of A and B in O(m).
For example:
Assume that
A={1,2,14,14,5}
and
B={2,2,14,3}
the algorithm needs to return 2 and 14.
I've tried to define two Arrays with size (n), one stands for the positive numbers and the other for the negative numbers, and each index of the arrays represents the number.
I thought I could scan one array of A and B and sign each element with 1 in the arrays, and check directly the elements of the other array.
But it turns out that I only can use the arrays when I initialize them - which takes O(n).
What can I do to improve the algorithm?
This can be done with a set:
A_set = set(A)
print([b for b in B if b in A_set])
The construction of the set happens in O(m), checking each element of B needs O(m) time, so the total runtime complexity is O(m).
You will also need O(m) space to store the set.
If you can create an array or bit vector of 2n+1 Booleans in O(m) time or better, then you don't have to initialize the array before you use it:
Create an array S of 2n+1 elements.
For each element a in A, set S[a+n] = false
For each element b in B, set S[b+n] = true
Return the elements a of A for which S[a+n] == true
If you can't create that array in that time, then maybe you can use #fafl's answer, although that is only expected O(m) time unless the set implementation is very special.
The thing to be noticed is that there are duplicates in both the sets (I know that's not what set is, sets have unique entries, but the stated example in the question prompt to me that, say it's a multiset). Expecting the n <= 10^6. Build an array from -n to n, which can function as a hash map. '0' be the index for '-n', then 'i' be the index for 'i-n'. Iterate over array A, and on encountering a value, say c, hash[c+n]++. Now take an empty set for the result. Iterate over the array B. For each value c in B, if hash[c+n] > 0, then put it in the result set and decrement the count. Like this one can find the intersection in O(m).
It is recommended to manually implement a bit-set to solve the problem.
For instance:
min(Array A, Array B) = 1 AND
max(Array A, Array B) = 14 —
therefore, the bit-set is from min = 1 to max = 14 and Array A = :11001000000001 and Array B = :01100000000001 — *(Array A) &= *(Array B) = :01000000000001 which is 2 and 14.

How to find all options with repetition?

I'm looking at a problem I can't really still figure out.
I need all (now what's the right math term) permutations? tuples? combinations? with repetition made out of 4 given elements.
I've got the elements A, B, C, D. There is four of them, the amount is fixed. For a given n, I need to be able to get all possible options from these four elements. For example:
n = 1;
Possible options:
A
B
C
D
n = 2;
Possible options:
AA
AB
AC
AD
BA
BB
BC
BD
...
DC
DD
n = 4; Possible options: AAAA
AAAB
AAAC
...
DDDC
DDDD
Would anyone be able to kind of direct me somewhere? There are some following conditions, but I should be able to filter them on the go. Of course I tried searching for the answer, but no topics seem to be the same issue I'm trying to solve.
Big thanks to anyone who would at least try to pinch me in the right direction a bit.
You can not do it using nested loops because you need n for loops and n is not determined at first (and it is ridiculous to use n nested for loops). One approach to solving the problem is to use recursive functions. The recursive function should return the basic list consisting of 4 letters for n=1 and for n>1 it should call itself with parameter (n-1) and then append each one of the four letters to each returned string from the recursive call. I suggest you to try to implement it yourself before reading the following pseudo code!
list func(int n){
if ( n == 1 ) {return list('A', 'B', 'C', 'D'); }
else {
result = list();
permutations = func(n-1);
for (each item in permutations) {
result.append(item+'A');
result.append(item+'B');
result.append(item+'C');
result.append(item+'D');
return result;
}
}
}

Add an object to an Array based on dictionary Array

I believe this would be an algorithm question, so it doesn't matter what language this is in, although I think in my case it will be implemented in jquery.
So the idea is I have two arrays of type Object. In this object there are many variables, but the only relevant one to this is a String. Lets say for argument sake, I have array [A, A, A, A, B, B, E, E, F] and the 'dictionary' array [A, B, C, D, E, F]. Whenever I want to add something to the first array, I want it to be added by the location of the object in the second array. So for instance, if I wanted to add D, the first array would now look like [A, A, A, A, B, B, D, E, E, F].
To add D, there is not necessarily a C or D to search for to put the D in after. In addition, there can be repeats like in the example. The array could also be empty that the program is adding 'D' to. Also, if there are already Ds in the first array, the D would go after the last D.
Important: The letters I used are only for clarification sake, the actual code uses words that are NOT in alphabetical order, so solutions using alphabetical order do NOT work in this instance.
Is there a better solution to this then brute force?

Smart and Fast Indexing of multi-dimensional array with R

This is another step of my battle with multi-dimensional arrays in R, previous question is here :)
I have a big R array with the following dimensions:
> data = array(..., dim = c(x, y, N, value))
I'd like to perform a sort of bootstrap comparing the mean (see here for a discussion about it) obtained with:
> vmean = apply(data, c(1,2,3), mean)
With the mean obtained sampling the N values randomly with replacement, to explain better if data[1,1,,1] is equals to [v1 v2 v3 ... vN] I'd like to replace it with something like [v_k1 v_k2 v_k3 ... v_kN] with k values sampled with sample(N, N, replace = T).
Of course I want to AVOID a for loop. I've read this but I don't know how to perform an efficient indexing of this array avoiding a loop through x and y.
Any ideas?
UPDATE: the important thing here is that I want a different sample for each sample in the fourth (value) dimension, otherwise it would be simple to do something like:
> dataSample = data[,,sample(N, N, replace = T), ]
Also there's the compiler package which speeds up for loops by using a Just In Time compiler.
Adding thes lines at the top of your code enables the compiler for all code.
require("compiler")
compilePKGS(enable=T)
enableJIT(3)
setCompilerOptions(suppressAll=T)

Most efficient order to reduce like elements in multiple arrays so the array has 0 elements

Can someone please assist in how I would solve this problem; I need to figure out a way to eliminate like elements in multiple arrays in the best/quickest order in order to drive my array to 0 elements. I.E. if I had the following arrays:
'a {1,12,10,31}'
'b {12,21}'
'c {12,18,5,21}'
'd {12,18,21}'
I'd want to remove 12 -> 21 (b is done) then -> 18 (d is done)
This problem is really related to software incompatibilities... Any ideas would be helpful.
Thanks,
Pat
Well, it depends on how many is multiple arrays. If you only have two, you can sort them individually, and iterate over both at the same time in order, and remove.
However, this gets complicated quickly when you have an arbitrary number of arrays.
In this case, it is easiest to:
Put everything (merge) in a single array (named ARRAY)
Sort the array (ARRAY)
Iterate over the array (ARRAY), while removing elements occurring just once, and leaving a single copy of elements occurring multiple times
Then for each original array (eg. A, B, C, D), iterate over this original array (eg. A) along with ARRAY together, and remove elements in A that are also found in ARRAY.
For step 4., you probably want something like (written in pseudo-C code):
foreach (A = arrays [A, B, C, D]) { // for each original array
int j=0;
for (int i=0;i<A.size;i++) { // iterating over array A
// increase index j to iterate ARRAY (find closest # in ARRAY >= A[i])
while (j<ARRAY.size-1 && A[i]>ARRAY[j]) j++;
if (ARRAY[j]==A[i]) /* remove it */;
else /* keep it */;
}
}

Resources