Disjoint Set DS - disjoint-union

I have been studying disjoint set data structures.
I've a query that while using union by rank & path compression together, if we skip using union by rank & assign precedence(parent) without any ranks comparison(of ranks of roots/representative element) of two sets(trees) will it affect the running time.
Though weighted-union heuristic is required while merging two sets,to append smaller set to larger one to make minimum updates as possible to point to representative element.
Union-by-rank(similar to weighted-union) is used while merging the two sets.But if we skip comparing ranks & randomly assign the precedence, it won't affect the running time??Then why do we use it..I am unable to see through it clearly ,please help me understand it if I'm wrong.
//no comparison for ranks
UNION(x,y)
x.parent=y;
genralized code
union(x,y){
if(x.rank>y.rank)
y.parent=x;
else
x.parent=y;
if(x.rank==y.rank)
y.rank=y.rank+1;
}

If we consider n disjoint sigleton elements and we apply the union function on them in such a way that always root of one set(i.e. of the tree formed in representation of the set) comes in union with a singleton element, then in all these union cases path compression has no effect and we may end up with creating a linked list.
In such worst cases, complexity of a single find can be θ(n), while union excluding the find for the two sets is still θ(1) as it is when including the union by rank. So, worst case complexities of individual queries on the set are improved by using union by rank which keeps it θ(lgn).
If we consider the case where finally after all queries we get 1 or a constant number of final sets then union by rank makes some difference in overall complexity if path compression is used.

Related

Stumped finding an algorithm for this problem re: finding set that does not belong

Given an array of sets find the one that does not belong:
example: [[a,b,c,d], [a,b,f,g], [a,b,h,i], [j,k,l,m]]
output: [j,k,l,m]
We can see above that the first three sets have a common subset [a,b] and the last one does not. Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.
I have tried iterating over the input list and keeping a count for each character (in a hash).
In a second pass, find which set has the smallest total.
In the example above, the last set would have a sum of counts of 4:
j*1 + k*1 + l*1 + m*1.
I'd like to know if there are better ways to do this.
Your description:
find the set that has the least in common with the other sets
Doing this as a general application would require computing similarity with each individual pair of sets; this does not seem to be what you describe algorithmically. Also, it's an annoying O(n^2) algorithm.
I suggest the follow amendment and clarification
find the set that least conforms to the mean of the entire list of sets.
This matches your description much better, and can be done in two simple passes, O(n*m) where you have n sets of size m.
The approach you outlined does the job quite nicely: count the occurrences of each element in all of the sets, O(nm). Then score each set according to the elements it contains, also O(nm). Keep track of which element has the lowest score.
For additional "accuracy", you could sort the scores and look for gaps in the scoring -- this would point out multiple outliers.
If you do this in Python, use the Counter class for your tally.
You shouldn't be looking for the smallest sum of count of elements. It is dependent on the size of the set. But if you substract the size of the set from the sum, it's 0 only if the set is disjoint from all the others. Another option, is to look at the maximum of the count of its elements. If the maximum is one on a set, then they only belong to the set.
There are many functions you can use. As the note states:
Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.
The previous functions are not optimal. A better function would count the number of shared elements. Set the value of an element to 1 if it's in multiple sets and 0 if it appears only once.

Postgres array comparison confusion

When I run
select array[19,21,500] <= array[23,5,0];
I get true.
but when I run
select array[24,21,500] <= array[23,5,0];
I get false. This suggests that the comparison is only on the first element.
I am wondering if there is an operator or possibly function that compares all the entries such that if all the entries in the left array are less than those in the right array (at the same index) it would return true, otherwise return false.
I'm hoping to retrieve all the rows that have an entire array "less than" or "greater than" a given array. I don't know if this is possible.
Arrays use ordinality as a basic property. In other words '{1,3,2}' <> '{1,2,3}' and this is important to understand when looking at comparisons. These look at successive elements.
Imagine for a moment that PostgreSQl didnt have an inet type. We could use int[] to specify cidr blocks. For example, we could see this as '{10,0,0,1,8}' to represent 10.0.0.1/8. We could then compare IP addresses on this way. We could also represent as a bigint as: '{167772161,8}' In this sort of comparison, if you have two IP addresses with different subnets, we can compare them and the one with the more specific subnet would come after the one with the less specific subnet.
One of the basic principles of database normalization is that each field should hold one and only one value for its domain. One reason arrays don't necessarily violate this principle is that, since they have ordinality (and thus act as a tuple rather than a set or a bag), you can use them to represent singular values. The comparisons make perfect sense in that case.
In the case where you want to create an operator which does not respect ordinality, youc an create your own. Basically you make a function that returns a bool based on the two, and then wrap this in an operator (see CREATE OPERATOR in the docs for more on how to do this). You are by no means limited by what PostgreSQL offers out of the box.
To actually conduct the operation you asked for, use unnest() in parallel and aggregate with bool_and():
SELECT bool_and(a < b) -- each element < corresponding element in 2nd array
,bool_and(a <= b)
,bool_and(a >= b)
,bool_and(a > b)
-- etc.
FROM (SELECT unnest('{1,2,3}'::int[]) AS a, unnest('{2,3,4}'::int[]) AS b) t
Both arrays need to have the same number of base elements to be unnested in parallel. Else you get a CROSS JOIN, i.e. a completely different result.

How can you guarantee quicksort will sort an array of integers?

Got asked this in a lecture...stumped by it a bit.
how can you guarantee that quicksort will always sort an array of integers?
Thanks.
Gratuitously plagiarising Wikipedia:
The correctness of the partition algorithm is based on the following
two arguments:
At each iteration, all the elements processed so far
are in the desired position: before the pivot if less than the pivot's
value, after the pivot if greater than the pivot's value (loop
invariant).
Each iteration leaves one fewer element to be processed
(loop variant).
The correctness of the overall algorithm can be proven
via induction: for zero or one element, the algorithm leaves the data
unchanged; for a larger data set it produces the concatenation of two
parts, elements less than the pivot and elements greater than it,
themselves sorted by the recursive hypothesis.
Quicksort function by taking a pivot value, and sorting the remaining data in to two groups. One higher and one lower. You then do this to the each group in turn until you get groups no larger than one. At this point you can guarantee that the data is sorted because you can guarantee that any pivot value is in the correct place because you have directly compared it with another pivot value, which is also in the correct place. In the end, you are left with sets of size 1 or size 0 which cannot be sorted because they cannot be rearranged and thus are already sorted.
Hope this helps, it was what we were taught for A Level Further Mathematics (16-18, UK).
Your professor may be referring to "stability." Have a look here: http://en.wikipedia.org/wiki/Stable_sort#Stability. Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary.
Quicksort (in efficient implementations) is not a stable sort, so one way to guarantee stability would be to insure that there are no duplicate integers in your array.

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?
I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.
Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.
It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.
Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).
The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

How to merge two linked lists in O(1) time in c?

Given two lists l1,l2, show how to merge them in O(1) time. The data structures for the lists depends on how you design it. By merging I mean to say union of the lists.
Eg: List1 = {1,2,3}
List2 = {2,4,5}
Merged list = {1,2,3,4,5}
On "merging" two sorted lists
It is straightforwardly impossible to merge two sorted lists into one sorted list in O(1).
About the closest thing you can do is have a "lazy" merge that can extract on-demand each successive element in O(1), but to perform the full merge, it's still O(N) (where N is the number of elements).
Another close thing you can do is physically join two lists ends to end into one list, performing no merge algorithm whatsoever, such that all elements from one list comes before all elements from the other list. This can in fact be done in O(1) (if the list maintains head and tail pointers), but this is not a merge by traditional definition.
On set union in O(1)
If the question is about what kind of set representation allows a union operation in O(1), then yes, this can in fact be done. There are many set representation possible in practice, each with some pluses and minuses.
An essential example of this specialized representation is the disjoint-set data structure, which permits a O(α(n)) amortized time for the elementary Union and Find operations. α is the inverse of the Ackermann function; where as the Ackermann function grows extremely quickly, its inverse α grows extremely slowly. The disjoint-set data structure essentially offers amortized O(1) operations for any practical size.
Note that "disjoint" is a key here: it can not represent two sets {1, 2, 3} and {2, 4, 5}, because those sets are not disjoint. The disjoint set data structure represents many sets (not just two), but no two distinct set is allowed to have an element in common.
Another highly practical set representation is a bit array, where elements are mapped to bit indices, and a 0/1 indicates absence/presence respectively. With this representation, a union is simply a bitwise OR; intersection a bitwise AND. Asymptotically this isn't the best representation, but it can be a highly performant set representation in practice.
What you are looking for is not an algorithm to merge two "lists" in O(1) time. If you read "lists" as "linked lists" then this cannot be done faster than O(n).
What you are being asked to find is a data structure to store this data in that supports merging in O(1) time. This data structure will not be a list. It is still in general impossible to merge in "hard" O(1) time, but there are data structures that support merging in amortized O(1) time. Perhaps the most well-known example is the Fibonacci Heap.
I am not that experienced so please do not hit me hard if I say something stupid.
Will this work ? Since you have two linklists how about you connect
on the last element of the first list
the first element of the second list ? We are still talking about pointers right ? the pointer of the last element of the list is now pointing to the first element of the second list.
Does this work ?
Edit : but we are looking for the union. so I guess it wont...
You make use of the fact that a PC is a finite state machine with 2^(bits of memory/storage space) states and thereby declare everything O(1).

Resources