Way to judge if two Arrays are identical? - arrays

By identical I mean two Arrays contain same elements, order of elements in Arrays are not matter here.
The solution I came up with is like this(which turns out a wrong approach as pointed out in comments):
if the size of two Arrays are equal
See True, find all elements of Array A in Array B
All Found, find all elements of Array B in Array A
All Found, then I get conclusion two Arrays are identical
However, is there better algorithm in term of time complexity?

Let's say you have an User[] array 1 and User[] array 2. You can lop through array one and add them to Dictionary<User, int> dictionary where the key is the user and the value is a count. Then you loop through the second array and for each user in array 2 decrement the count in the dictionary (if count is greater than 1) or remove the element (if count is 1). If the user isn't in the dictionary, then you can stop, the arrays don't match.
If you get to the end and had previously checked length of the arrays is same, then the arrays match. If you hadn't checked length earlier (which of course you still should have), then you can just verify the dictionary is now empty after completely looping through array 2.
I don't know exactly what the performance of this is, but it will be faster than sorting both lists and looping through them comparing element by element. Takes more memory though, but if the arrays are not super large then memory usage shouldn't be an issue.

First, check the size of the two arrays. If they aren't equal then they don't contain the same elements.
After that, sort both the arrays in O(n lg(n)). Now, just check both the arrays element-by-element in O(n). As they are sorted, if they are equal then they will be equal in every position.

Your approach doesn't work, as it would treat [0, 1, 1] as being equal to [0, 0, 1]. Every item in A is in B and vice versa. You'd need to count the number of occurrences of each item in A and B. (You then don't need to do both, of course, if you've already checked the lengths.)
If the contents are sortable, you can sort both and then compare element-by-element, of course. That only works if you can provide a total ordering to elements though.

Sort both arrays according to a strict ordering and compare them term-by-term.
Update: Summarizing some of the points that have been raised, here the efficiency you can generally expect:
strict ordering available: O(log N) for sorting plus comparing term-by-term
equality and hash function available: compare hash counts term-by-term, plus actual object comparisons in the event of hash collisions.
only equality, no hashing available: must count each element or copy one container and remove (efficiency depends on the container).
The complexity of comparison term-by-term is linear in the position of the first mismatch.

My idea is to loop through the first array and look for items in the second array. The only issue of course is that you can't use an item in the second array twice. So, make a third array of booleans. This array indicates which items in array 2 'have been used'.
Loop through the first array. Inside that loop through each element in the second array to see if you can 'find' that element in the second array, but also check the third array to verify that the position in the second array hasn't been used. If you find a match update that position in the third array and move on.
You should only need to do this once. If you finish and you found a match for all items in array 2 then no unmatched items remain in array 2. You don't need to then loop through array 2 and see if array 1 contains the item.
Of course before you start all that check that the lengths are the same.

If you don't mind extra space you can do some like HashMap to Store the (element,count) pairs of the first array then check if the second array matches up; this would be linear in N (size of biggest array)

If the array sizes are identical and all of the elements in Array A are in Array B, then there is no need to verify that all of the elements in array B are in Array A. So at the very least you can omit that step.
EDIT: Depends on the definition of the problem. This solution would work if and only if his original solution would work, which it wouldn't if the arrays can have duplicate items and you weren't counting or marking them as "used."

Related

Find first non-consecutive array element using Google Sheets Filter

I have to make this work in a (always sorted) array that results from splitting a delimited string of sorted whole numbers. Concretely, if my cell contains "1,2,3,5" I need a formula to evaluate to 4. A beneficial side-effect would be to find an implementation that would give the last-number+1, if the original array had only consecutive numbers, i.e., applying the formula to "1,2,3,4,5" would evaluate to 6.
My approach has been to generate a new array that is a perfect sequence and compare it with my original array, to find the first element where the two arrays are not equal.
Creating a perfect sequence of the array like this:
=TRANSPOSE(SEQUENCE(COUNT(arr),1,MIN(arr),1))
So all that would be left to do is compare arr with the sequence above to find the first element that differed, something like:
=COUNTA(IFERROR(FILTER(arr;MATCH(arr; transpose(sequence(count(arr),1,min(arr),1))
;0))))
Sadly, what I have above is not correctly "short-circuiting" at the first non equal value of the arrays. Is COUNTIF the way to go?
If my previous step gets me the index of the element instead of the value, then what remains is to get the value at that index:
INDEX( arr, 1, counta(iferror(filter(arr;match(arr1; transpose(sequence(count(arr),1,min(arr),1))
;0)))) )
Is there a more straight-forward way get the first non-consecutive element? A way that does not involve actual ranges in the spreadsheet?
After some thought, I realized that set-subtracting (i.e. set difference) the original array from the generated sequence always gives me the first missing number.
So my final formula to handle this and the case where all numbers are in sequence is this:
IFERROR(INDEX(FILTER(TRANSPOSE(SEQUENCE(COUNT(SPLIT(G7," ")),1,LEFT(G7,4),1)), ISERROR(MATCH(TRANSPOSE(SEQUENCE(COUNT(split(G7," ")),1,LEFT(G7,4),1)),SPLIT(G7, " "),False))),1,1),RIGHT(G7,4)+1) )
I'm sure there is a better, more concise answer, but this does the job.

Find duplicates in an array in linear time

Problem: You are given an array of n+1 integers from range 1..n. At least one number has duplicate. All array values can be same. Print all duplicates in linear time and constant space. Array can't be modified.
The obvious solution would be to create a bit array with default value false, set 1 in bitarray[array[i]] for each element, then check if it's already 1. That requires additional space, so no good. My another thought: reorder the array by hash and check if a current element and the element array[hash % n] are equal. This is also no good since we can't modify the original array. Now I think that it looks like an impossible task. Is there even a solution to this?

array to NSOrderedSet change size of list to 1

I have a array of 2 element, when i try to convert it to NSOrderedSet it give me a size of only one element
LOG("\(listTwoElement.count)")
This line give me 2
LOG("\(NSOrderedSet.init(array: listTwoElement).count)")
this line give me only 1 element
Why conversion to NSOrderedSet change size of array ?
thanks for your help
A Set is an unordered list of unique elements. If you try to put the same element in a set twice you will only get one
An Array is an order list of elements that needn't be unique. You can have the same element may times in array.
A OrderedSet is a combination of both (but a subclass of neither). The items are ordered, like an array, but each element needs to be unique.
Unique element does not (just) mean that they are stored in different locations in memory, but that they are not "equal" in some fundamental sense. Equality can be defined differently for every class. Classes that want to define it, implement the isEqual: and hash method of the NSObjectProtocol. (Hashes of equal objects must be equal, hashes of unequal object do not need to be different).
Without knowing more about the element you are dealing with it is hard to say why they are being evaluated as equal when you don't expect them to be. I would start will looking at the class's implementation of the isEqual and hash.

Efficient method to sort an array (The method of sorting must be picking an element from the array and placing it elsewhere in the array)

What is the most efficient method I can use to sort an array of integers by selecting one integer at a time and inserting it anywhere in the array? Or, inserting it at the end or the beginning of the array?
I'm looking for an algorithm that can do this in the minimum number of steps, unlike selection sort.
You need to figure out the longest sequence of sorted elements. Then the number of steps is the number of elements that are not in the sequence.
Since the elements that don't belong to the selected sequence are out of order you need to move each of them to the right position. Since each move is a step according to your description, and you already picked the longest sorted sequence, this will give you the least number of steps.

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?
I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.
Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.
It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.
Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).
The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

Resources