Find duplicates in an array in linear time - arrays

Problem: You are given an array of n+1 integers from range 1..n. At least one number has duplicate. All array values can be same. Print all duplicates in linear time and constant space. Array can't be modified.
The obvious solution would be to create a bit array with default value false, set 1 in bitarray[array[i]] for each element, then check if it's already 1. That requires additional space, so no good. My another thought: reorder the array by hash and check if a current element and the element array[hash % n] are equal. This is also no good since we can't modify the original array. Now I think that it looks like an impossible task. Is there even a solution to this?

Related

Find first non-consecutive array element using Google Sheets Filter

I have to make this work in a (always sorted) array that results from splitting a delimited string of sorted whole numbers. Concretely, if my cell contains "1,2,3,5" I need a formula to evaluate to 4. A beneficial side-effect would be to find an implementation that would give the last-number+1, if the original array had only consecutive numbers, i.e., applying the formula to "1,2,3,4,5" would evaluate to 6.
My approach has been to generate a new array that is a perfect sequence and compare it with my original array, to find the first element where the two arrays are not equal.
Creating a perfect sequence of the array like this:
=TRANSPOSE(SEQUENCE(COUNT(arr),1,MIN(arr),1))
So all that would be left to do is compare arr with the sequence above to find the first element that differed, something like:
=COUNTA(IFERROR(FILTER(arr;MATCH(arr; transpose(sequence(count(arr),1,min(arr),1))
;0))))
Sadly, what I have above is not correctly "short-circuiting" at the first non equal value of the arrays. Is COUNTIF the way to go?
If my previous step gets me the index of the element instead of the value, then what remains is to get the value at that index:
INDEX( arr, 1, counta(iferror(filter(arr;match(arr1; transpose(sequence(count(arr),1,min(arr),1))
;0)))) )
Is there a more straight-forward way get the first non-consecutive element? A way that does not involve actual ranges in the spreadsheet?
After some thought, I realized that set-subtracting (i.e. set difference) the original array from the generated sequence always gives me the first missing number.
So my final formula to handle this and the case where all numbers are in sequence is this:
IFERROR(INDEX(FILTER(TRANSPOSE(SEQUENCE(COUNT(SPLIT(G7," ")),1,LEFT(G7,4),1)), ISERROR(MATCH(TRANSPOSE(SEQUENCE(COUNT(split(G7," ")),1,LEFT(G7,4),1)),SPLIT(G7, " "),False))),1,1),RIGHT(G7,4)+1) )
I'm sure there is a better, more concise answer, but this does the job.

algorithm to find missing values in an array .. in place?

I had a job interview today in which I was given an array of size n and the objective was to find the missing values in the array.
Input:
arr = [9,7,1,3,4,2,5]
Output:
6, 8
The input arr contains elements only from 1 till n. Note that the array is not sorted.
I got the "naive" version of the program done quickly, but it was O(n2) time complexity. I came up with a solution using a hash table (Python dictionary in my case) with time and space complexity both as O(n).
But then, I was asked to do it in place with no extra space, like the extra arrays and dictionary space I used in the solutions mentioned above were not allowed.
Any ideas to do this in time-complexity O(n) and space complexity O(1).
It could have been a trick question, given that "array like this" is not a sufficient specification, allowing for short cuts. And without short cuts (which make assumptions on the values in the array or other restrictions), the problem should be of sorting complexity (O(n log n)).
[2,2,2,2,2,2,2] is also an array "like this"? Or
[2,-2,2,-2,1000] ?
Depending on the "like-this-ishness" of what they imagine, there might or might not be short cuts.
So maybe, this job interview question was about how you query for missing information/requirements.
Assumption: The array consists of only integers in range [1,n] (there could be no other way to solve this question for a random number of missing values in O(N) without extra space).
The algorithm would be as follows:
for i in range 1 to n:
index = abs(array[i])
if array[index]>0:
array[index] *= -1
for i in range 1 to n:
if array[i]>0:
print(i)
We use a simple observation here: for each element in the array, we make the item at index element negative. Then, we traverse the array again and see if we get any positive values, which means that index was never modified. If the index wasn't modified, then it was one of the missing values, so we print it.

What C construct would allow me to 'reverse reference' an array?

Looking for an elegant way (or a construct with which I am unfamiliar) that allows me to do the equivalent of 'reverse referencing' an array. That is, say I have an integer array
handle[number] = nameNumber
Sometimes I know the number and need the nameNumber, but sometimes I only know the nameNumber and need the matching [number] in the array.
The integer nameNumber values are each unique, that is, no two nameNumbers that are the same, so every [number] and nameNumber pair are also unique.
Is there a good way to 'reverse reference' an array value (or some other construct) without having to sweep the entire array looking for the matching value, (or having to update and keep track of two different arrays with reverse value sets)?
If the array is sorted and you know the length of it, you could binary search for the element in the array. This would be an O(n log(n)) search instead of you doing O(n) search through the array. Divide the array in half and check if the element at the center is greater or less than what you're looking for, grab the half of the array your element is in, and divide in half again. Each decision you make will eliminate half of the elements in the array. Keep this process going and you'll eventually land on the element you're looking for.
I don't know whether it's acceptable for you to use C++ and boost libraries. If yes you can use boost::bimap<X, Y>.
Boost.Bimap is a bidirectional maps library for C++. With Boost.Bimap you can create associative containers in which both types can be used as key. A bimap can be thought of as a combination of a std::map and a std::map.

Way to judge if two Arrays are identical?

By identical I mean two Arrays contain same elements, order of elements in Arrays are not matter here.
The solution I came up with is like this(which turns out a wrong approach as pointed out in comments):
if the size of two Arrays are equal
See True, find all elements of Array A in Array B
All Found, find all elements of Array B in Array A
All Found, then I get conclusion two Arrays are identical
However, is there better algorithm in term of time complexity?
Let's say you have an User[] array 1 and User[] array 2. You can lop through array one and add them to Dictionary<User, int> dictionary where the key is the user and the value is a count. Then you loop through the second array and for each user in array 2 decrement the count in the dictionary (if count is greater than 1) or remove the element (if count is 1). If the user isn't in the dictionary, then you can stop, the arrays don't match.
If you get to the end and had previously checked length of the arrays is same, then the arrays match. If you hadn't checked length earlier (which of course you still should have), then you can just verify the dictionary is now empty after completely looping through array 2.
I don't know exactly what the performance of this is, but it will be faster than sorting both lists and looping through them comparing element by element. Takes more memory though, but if the arrays are not super large then memory usage shouldn't be an issue.
First, check the size of the two arrays. If they aren't equal then they don't contain the same elements.
After that, sort both the arrays in O(n lg(n)). Now, just check both the arrays element-by-element in O(n). As they are sorted, if they are equal then they will be equal in every position.
Your approach doesn't work, as it would treat [0, 1, 1] as being equal to [0, 0, 1]. Every item in A is in B and vice versa. You'd need to count the number of occurrences of each item in A and B. (You then don't need to do both, of course, if you've already checked the lengths.)
If the contents are sortable, you can sort both and then compare element-by-element, of course. That only works if you can provide a total ordering to elements though.
Sort both arrays according to a strict ordering and compare them term-by-term.
Update: Summarizing some of the points that have been raised, here the efficiency you can generally expect:
strict ordering available: O(log N) for sorting plus comparing term-by-term
equality and hash function available: compare hash counts term-by-term, plus actual object comparisons in the event of hash collisions.
only equality, no hashing available: must count each element or copy one container and remove (efficiency depends on the container).
The complexity of comparison term-by-term is linear in the position of the first mismatch.
My idea is to loop through the first array and look for items in the second array. The only issue of course is that you can't use an item in the second array twice. So, make a third array of booleans. This array indicates which items in array 2 'have been used'.
Loop through the first array. Inside that loop through each element in the second array to see if you can 'find' that element in the second array, but also check the third array to verify that the position in the second array hasn't been used. If you find a match update that position in the third array and move on.
You should only need to do this once. If you finish and you found a match for all items in array 2 then no unmatched items remain in array 2. You don't need to then loop through array 2 and see if array 1 contains the item.
Of course before you start all that check that the lengths are the same.
If you don't mind extra space you can do some like HashMap to Store the (element,count) pairs of the first array then check if the second array matches up; this would be linear in N (size of biggest array)
If the array sizes are identical and all of the elements in Array A are in Array B, then there is no need to verify that all of the elements in array B are in Array A. So at the very least you can omit that step.
EDIT: Depends on the definition of the problem. This solution would work if and only if his original solution would work, which it wouldn't if the arrays can have duplicate items and you weren't counting or marking them as "used."

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?
I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.
Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.
It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.
Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).
The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

Resources