Number of pairs that can be made with X elements without repeating a pair? - database

I would like to make pairs out of X number of database objects.
Order of pairs does not matter.
Pairs will be made over multiple rounds.
I do not want pairs to be repeated in a subsequent round.
If I have:
A
B
C
D
The first round might be:
AB
CD
Second round might be:
AD
CB
Third round might be:
AC
DB
And there would be no other possibilities.
So for 4 elements, I could do 3 rounds before I have to repeat a pair.
What is the formula to help me with this for any number of elements?
Related How do I get the total number of unique pairs of a set in the database?

You can use round-robin tournament algorithm to generate all possible pairs. Your example shows r-r algo in action: make two rows of elements, fix the first one (A) and rotate other elements in cyclic manner.
Note that N elements form N*(N-1)/2 pairs, and (N-1) rounds are needed to generate all of them

Related

Swapping some operations to

Assume we have got an array arr with all initial values 0. Now we are given n operations - an operation consists of two numbers a b. It means that we are adding +1 to the value of arr[a] and adding -1 to the value of arr[b].
Moreover, we can swap numbers in some operations, what means that we will add -1 to arr[a] and +1 to arr[b].
We want to achieve a situation, in which all values of arr are equal to 0 even after all these operations. We are wondering if that is possible, and, if yes, what operations should we swap to achieve that.
Any thoughts on that?
Some example input:
3
1 2
3 2
3 1
should result in
YES
R
N
R
where R means to reverse that operation, and N not to do it.
input:
3
1 2
2 3
3 2
results in answer NO.
Let each of the array element be a vertex in a graph and the operation (a,b) be a edge from vertex a to b (there might be multiple edges between same vertices). Now traveling from the vertex a means decrease array element a and traveling to the vertex a means increase array element a. Now if each vertex has an even number of edges and you find a cyclic path that visits each edge exactly once you will have a zero total sum in the array.
Such a path is called Eulerian cycle (Wikipedia). From the Wikipedia: An undirected graph has an Eulerian cycle if and only if every vertex has even degree, and all of its vertices with nonzero degree belong to a single connected component. As in your case only all the disconnected sub graphs need to have an Eulerian cycle it is enough to count how many times each array index appears and if the count is even for each one of them there is always way to obtain zero total in the array.
If you want to find out which operations to reverse, you need to find one of such paths and check which directions you travel the edges.
Just count the number of times the indices appear. If all indices appear in even numbers, then the answer is YES.
You can prove it by construction. You will need to build a pair list from the original pair list. The goal is to build the list such that you can match every index that appears on the left with an index that appears on the right.
Go from the first pair to the last. For each pair, try to match an index that appears an odd number of times.
For example, in your first example, each index appears twice, so the answer is YES. To build the list, you start with (1,2). Then you look at the pair (3,2) and you know that 2 appears once on the right, so you swap it to have 2 on the left: (2,3). For the last pair, you have (3,1) which matches 1 and 3 that appear only once so far.
Note that at the end, you can always find a matching pair, because each number appears in an even number. Each number should have a match.
In the second example, 2 appears three times. So the answer is NO.

Given a source and a final array, find the number of hops to generate final from the source in less than quadratic time complexity

Suppose you have an array of integers (for eg. [1 5 3 4 6]). The elements are rearranged according to the following rule. Every element can hop forward (towards left) and slide the elements in those indices over which it hopped. The process starts with element in second index (i.e. 5). It has a choice to hop over element 1 or it can stay in its own position.If it does choose to hop, element 1 slides down to index 2. Let us assume it does choose to hop and our resultant array will then be [5 1 3 4 6]. Element 3 can now hop over 1 or 2 positions and the process repeats. If 3 hops over one position the array will now be [5 3 1 4 6] and if it hops over two positions it will now be [3 5 1 4 6].
It is very easy to show that all possible permutation of the elements is possible in this way. Also any final configuration can be reached by an unique set of occurrences.
The question is, given a final array and a source array, find the total number of hops required to arrive at the final array from the source. A O(N^2) implementation is easy to find, however I believe this can be done in O(N) or O(NlogN). Also if it is not possible to do better than O(N2) it will be great to know.
For example if the final array is [3,5,1,4,6] and the source array [1,5,3,4,6], the answer will be 3.
My O(N2) algorithm is like this: you loop over all the positions of the source array from the end, since we know that is the last element to move. Here it will be 6 and we check its position in the final array. We calculate the number of hops necessary and need to rearrange the final array to put that element in its original position in the source array. The rearranging step goes over all the elements in the array and the process loops over all the elements, hence O(N^2). Using Hashmap or map can help in searching, but the map needs to be updated after every step which makes in O(N^2).
P.S. I am trying to model correlation between two permutations in a Bayesian way and this is a sub-problem of that. Any ideas on modifying the generative process to make the problem simpler is also helpful.
Edit: I have found my answer. This is exactly what Kendall Tau distance does. There is an easy merge sort based algorithm to find this out in O(NlogN).
Consider the target array as an ordering. A target array [2 5 4 1 3] can be seen as [1 2 3 4 5], just by relabeling. You only have to know the mapping to be able to compare elements in constant time. On this instance, to compare 4 and 5 you check: index[4]=2 > index[5]=1 (in the target array) and so 4 > 5 (meaning: 4 must be to the right of 5 at the end).
So what you really have is just a vanilla sorting problem. The ordering is just different from the usual numerical ordering. The only thing that changes is your comparison function. The rest is basically the same. Sorting can be achieved in O(nlgn), or even O(n) (radix sort). That said, you have some additional constraints: you must sort in-place, and you can only swap two adjacent elements.
A strong and simple candidate would be selection sort, which will do just that in O(n^2) time. On each iteration, you identify the "leftiest" remaining element in the "unplaced" portion and swap it until it lands at the end of the "placed" portion. It can improve to O(nlgn) with the use of an appropriate data structure (priority queue for identifying the "leftiest" remaining element in O(lgn) time). Since nlgn is a lower bound for comparison based sorting, I really don't think you can do better than that.
Edit: So you're not interested in the sequence of swaps at all, only the minimum number of swaps required. This is exactly the number of inversions in the array (adapted to your particular needs: "non natural ordering" comparison function, but it doesn't change the maths). See this answer for a proof of that assertion.
One way to find the number of inversions is to adapt the Merge Sort algorithm. Since you have to actually sort the array to compute it, it turns out to be still O(nlgn) time. For an implementation, see this answer or this (again, remember that you'll have to adapt).
From your answer I assume number of hops is total number of swaps of adjacent elements needed to transform original array to final array.
I suggest to use something like insert sort, but without insertion part - data in arrays will not be altered.
You can make queue t for stalled hoppers as balanced binary search tree with counters (number of elements in subtree).
You can add element to t, remove element from t, balance t and find element position in t in O(log C) time, where C is the number of elements in t.
Few words on finding position of element. It consists of binary search with accumulation of skipped left sides (and middle elements +1 if you keep elements on branches) counts.
Few words on balancing/addition/removal. You have to traverse upward from removed/added element/subtree and update counters. Overall number of operations still hold at O(log C) for insert+balance and remove+balance.
Let's t is that (balanced search tree) queue, p is current original array index, q is final array index, original array is a, final array is f.
Now we have 1 loop starting from left side (say, p=0, q=0):
If a[p] == f[q], then original array element hops over the whole queue. Add t.count to the answer, increment p, increment q.
If a[p] != f[q] and f[q] is not in t, then insert a[p] into t and increment p.
If a[p] != f[q] and f[q] is in t, then add f[q]'s position in queue to answer, remove f[q] from t and increment q.
I like the magic that will ensure this process will move p and q to ends of arrays in the same time if arrays are really permutations of one array. Nevertheless you should probably check p and q overflows to detect incorrect data as we have no really faster way to prove data is correct.

Get product of each pair in an array

I have an array of integers and I have to find product of each pair in array.
Say array is {1,2,3,4} then output should be {1*2, 1*3, 1*4, 2*3, 2*4, 3*4}.
Is there any way other than brute force one to get above result. By brute force I mean take one number from array and loop through the array and store product of each pair. Can this be done in time better than O(n^2)?
You can't.
Since you could have O(n^2) unique results you need O(n^2) operations to access each of these numbers.
For example, take the case where all numbers in the 2 arrays are unique primal numbers.
If there is no redundancy, there is no other way than executing n*(n-1)/2 multiplications.
Imagine a graph where each item is a node and each edge represents the multiplication of the nodes is connecting.
That results into a complete graph and your output is the result of each edge and you will have n*(n-1)/2 of those.

Need an algorithm to find all subsets of 5 integers within an array whose sum falls within a given range

I have an array containing 43 terms, and I need to find all possible combinations of 5 terms whose sum is within a particular range.
I have looked to previous posts, and have found similar problems, but they all have variations that do not suit my issue.
If your target range is large, then you will have to produce most of the 43 choose 5 = 962,598 5-tuples of 43 elements. On a computer, that usually isn't too bad.
If the range is narrow, you can do better. One improvement is to make a sorted list of the triples of elements, sorted by sum, and a sorted list of the pairs of elements, sorted by sum. For each pair, take the connected sublist of sums of triples that give a total in the correct range, filter to those triples so that all of the indices used in the pairs are greater than all of the indices used in the triples (to avoid duplicated 5-tuples and duplicated elements). If the range is very narrow, this might take roughly c n^3 log n steps for an n element array instead of c n^5.

Categorizing data based on the data's signature

Let us say I have some large collection of rows of data, where each element in the row is a (key, value) pair:
1) [(bird, "eagle"), (fish, "cod"), ... , (soda, "coke")]
2) [(bird, "lark"), (fish, "bass"), ..., (soda, "pepsi")]
n) ....
n+1) [(bird, "robin"), (fish, "flounder"), ..., (soda, "fanta")]
I would like the ability to run some computation that would allow me to determine for a new row, what is the row that is "most similar" to this row?
The most direct way I could think of finding the "most similar" row for any particular row is to directly compare said row against all other rows. This is obviously computationally very expensive.
I am looking for a solution of the following form.
A function that can take a row, and generate some derivative integer for that row. This returned integer would be a sort of "signature" of the row. The important property of this signature is that if two rows are very "similar" they would generate very close integers, if rows are very "different", they would generate distant integers. Obviously, if they are identical rows they would generate the same signature.
I could then takes these generated signatures, with the index of the row they point to, and sort them all by their signatures. This data structure I would keep so that I can do fast lookups. Call it database B.
When I have a new row, I wish to know which existent row in database B is most similar, I would:
Generate a signature for the new row
Binary search through the sorted list of (signature,index) in database B for the closet match
Return the closest matching (could be a perfect match) row in database B.
I know their is a lot of hand waving in this question. My problem is that I do not actually know what the function would be that would generate this signature. I see Levenshtein distances, but those represent the transformation cost, not so much the signature. I see that I could try lossy compressions, two things might be "bucketable" as they compress to the same thing. I am looking for other ideas on how to do this.
Thank you.
EDIT: This is my original answer, which we will call Case 1, where there is no precedence to the keys
You cannot do it as a sorted integer because that is one dimensional and your data is multi-dimensional. So "nearness" in that sense cannot be established on a line.
Your example shows bird, fish and soda for all 3 lines. Are the keys fixed and known? If they are not, then your first step is to hash the keys of a row to establish rows that have the same keys.
For the values, consider this as a poor man's Saturday Night similarity trick. Hash the values, any two rows that match on that hash are an exact match and represent the same "spot", zero distance.
If N is the number of key/value pairs:
The closest non-exact "nearness" would mean matching N-1 out of N values. So you generate N more hashes, each one dropping out one of the values. Any two rows that match on those hashes have N-1 out of N values in common.
The next closest non-exact "nearness" would mean matching N-2 out of N values. So you generate more than N more hashes (I can't figure the binary this late), this time each hash leaves out a combination of two values. Any two rows that match on those hashes have N-2 out of N values in common.
So you can see where this is going. At the logical extreme you end up with 2^N hashes, not very savory, but I'm assuming you would not go that far because you reach a point where too few matching values would be considered to "far" to be worth considering.
EDIT: To see how we cannot escape dimensionality, consider just two keys, with values 1-9. Plot all possible values on a graph. We see see that {1,1} is close to {2,2}, but also that {5,6} is close to {6,7}. So we get a brainstorm, we say, Aha! I'll calculate each point's distance from the origin using Pythagorean theorem! This will make both {1,1} and {2,2} easy to detect. But then the two points {1,10} and {10,1} will get the same number, even though they are as far apart as they can be on the graph. So we say, ok, I need to add the angle for each. Two points at the same distance are distinguished by their angle, two points at the same angle are distinguished by their distance. But of course now we've plotted them on two dimensions.
EDIT: Case 2 would be when there is precedence to the keys, when key 1 is more significant than key 2, which is more significant than key 3, etc. In this case, if the allowed values were A-Z, you would string the values together as if they were digits to get a sortable value. ABC is very close to ABD, but very far from BBD.
If you had a lot of data, and wanted to do this hardcore, I would suggest a statistical method like PLSA or PSVM, which can extract identifying topics from text and identify documents with similar topic probabilities.
A simpler, but less accurate way of doing it is using Soundex, which is available for many languages. You can store the soundex (which will be a short string, not an integer I'm afraid), and look for exact matches to the soundex, which should point to similar rows.
I think it's unrealistic to expect a function to turn a series of strings into an integer such that integers near each other map to similar strings. The closest you might come is doing a checksum on each individual tuple, and comparing the checksums for the new row to the checksums of existing rows, but I'm guessing you're trying to come up with a single number you can index on.

Resources