algorithm to find missing values in an array .. in place? - arrays

I had a job interview today in which I was given an array of size n and the objective was to find the missing values in the array.
Input:
arr = [9,7,1,3,4,2,5]
Output:
6, 8
The input arr contains elements only from 1 till n. Note that the array is not sorted.
I got the "naive" version of the program done quickly, but it was O(n2) time complexity. I came up with a solution using a hash table (Python dictionary in my case) with time and space complexity both as O(n).
But then, I was asked to do it in place with no extra space, like the extra arrays and dictionary space I used in the solutions mentioned above were not allowed.
Any ideas to do this in time-complexity O(n) and space complexity O(1).

It could have been a trick question, given that "array like this" is not a sufficient specification, allowing for short cuts. And without short cuts (which make assumptions on the values in the array or other restrictions), the problem should be of sorting complexity (O(n log n)).
[2,2,2,2,2,2,2] is also an array "like this"? Or
[2,-2,2,-2,1000] ?
Depending on the "like-this-ishness" of what they imagine, there might or might not be short cuts.
So maybe, this job interview question was about how you query for missing information/requirements.

Assumption: The array consists of only integers in range [1,n] (there could be no other way to solve this question for a random number of missing values in O(N) without extra space).
The algorithm would be as follows:
for i in range 1 to n:
index = abs(array[i])
if array[index]>0:
array[index] *= -1
for i in range 1 to n:
if array[i]>0:
print(i)
We use a simple observation here: for each element in the array, we make the item at index element negative. Then, we traverse the array again and see if we get any positive values, which means that index was never modified. If the index wasn't modified, then it was one of the missing values, so we print it.

Related

Find duplicates in an array in linear time

Problem: You are given an array of n+1 integers from range 1..n. At least one number has duplicate. All array values can be same. Print all duplicates in linear time and constant space. Array can't be modified.
The obvious solution would be to create a bit array with default value false, set 1 in bitarray[array[i]] for each element, then check if it's already 1. That requires additional space, so no good. My another thought: reorder the array by hash and check if a current element and the element array[hash % n] are equal. This is also no good since we can't modify the original array. Now I think that it looks like an impossible task. Is there even a solution to this?

Algorithm - What is the best algorithm for detecting duplicate numbers in small array?

What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead.
Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.
After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.
But actually I am really confused about what to use in my case which is a small array.
Thanks in advance.
To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).
You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)
In other words, you can use something like (pseudo-code):
for idx1 = 0 to nums.len - 2 inclusive:
for idx2 = idx1 + 1 to nums.len - 1 inclusive:
if nums[idx1] == nums[idx2]:
return nums[idx1]
return no dups found
This finds the first value in the array which has a duplicate.
If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.
You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.
Two good approaches depend on the fact that you know or not the range from which numbers are picked up.
Case 1: the range is known.
Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.
You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).
Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.
The same idea is behind counting sort, and it works fine also for your problem.
Case 2: the range is not known.
Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.
Note.
In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values.
Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N).
The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).
In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.
As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.
Which is the best choice? Well, it depends on your problem, we cannot say.

Count distinct array entries [with no add memory nor array changes]

Task is count unique numbers of a given array. I saw numerous similar questions on SO, but here we have additional requirements, which weren't stated in other questions:
Amount of allowed additional memory is O(1)
Changes to array are
prohibited
I was able to write quadratic algorithm, which agrees with given constraints. But I keep wondering, may one could do better on such a problem? Thank you for your time.
Algorithm working with O(n^2)
def count(a):
unique = len(a)
ind = 0
while ind < len(a):
x = a[ind]
i = ind+1
while i < len(a):
if a[i] == x:
unique -= 1
break
i += 1
ind += 1
print("Total uniques: ", unique)
This is a very similar problem to a follow-up question in chapter 1 (Arrays and Strings) from Cracking the Coding Interview:
Implement an algorithm to determine if a string has all unique
characters. What if you cannot use additional data structures?
The answer (to the follow-up question) is that if you can't assume anything about the array (namely, it is not sorted, you don't know its size, etc.), then there is no algorithm better than what you showed.
That being said, you may think about relaxing the constraints a little bit, to make it more interesting. For example, if you have an upper bound on the array size, you could use a bit vector to keep track of which values you've read before while traversing the array, although this is not strictly an O(1) solution when it comes to memory usage (one could argue that by knowing the maximum array size, the memory usage is constant, and thus O(1), but that is a little bit of cheating). Similarly, if the array was sorted, you could also solve it in O(n) by going through each element at a time and check if its neighbors are different numbers.
Because there is no underlying structure in the array given (sorted, etc.) you are forced to brute force every value in the array...
There is a more complicated approach that I believe would work. It entails keeping your array of unique numbers sorted. This means that it would take more time when inserting into the array but would allow you to look-up values much quicker. You should be able to insert into the array in logn time by looking at the value directly in the middle of the array and checking if it's larger or smaller. You'd then eliminate half the array as a valid insertion location and repeat. You would use a similar approach to look-up values in the array. The only issue with this is that it requires more memory space than I believe you are allocated (1).
That being said, I think given the constraints on the task restrict the algorithm to O(n^2).

Algorithm to detect duplication of integers in an unsorted array of n integers. (implemented with 2 nested loops) [duplicate]

This question already has answers here:
Limit input data to achieve a better Big O complexity
(3 answers)
Closed 8 years ago.
You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once).
Describe an algorithm (implemented with two nested loops) to do this.
My description of the algorithm:
In step 1, we write a while loop to check to see if the array is empty/null, if the array is not null then we proceed with an inner loop.
Step 2, we now write a for loop to run an iteration of n-1 and in that loop we will assign to current (variable) the first index in the array (in the first iteration) and we will update the current variable by index + 1 each time through the iteration which means that the first time, current will hold the first index in the array and the second time, it will hold the second index in the array and so on until the loop ends.
Step 3, we will write a loop within the for loop (used in step 2) to compare the current number to all the integers in the array, if the integer equals to the next number then we will print the number using a printf statement else update next to hold the next index in the array and use that to compare to the current variable and do so until it has been compared to all the integers in the array and once this has been done, the current variable will be updated to store the next index of the array and will compare that particular number to all the integers in the array.
Will the algorithm be correct? (according to the question)... you're suggestions would be grateful. And no! it's not a homework question or such. Thank you for your time.
The complexity is definitely O (N^2) = N * ((N + 1)/2) Or O(N^2) in its simplified manner.
Edit:
I have added a description of an algorithm that is more efficient (in the question below). But going back to the question above, would it be suitable as an answer for an exam question? (it has shown up in previous papers so i would really appreciate your help).
If we limit the input data in order to achieve some best case scenario, how can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
If we limit the data to, let’s say, array size of 5 (n = 5), we could reduce the complexity to O(N). If the array is sorted, than all we need is a single loop to compare each element to the next element in the array and this will find if duplicates exist. Which simply means that if an array given to us is by default (or luckily) already sorted (from lowest to highest value) in this case the reduction will be from O(N^2) to O(N) as we wouldn’t need the inner loop for comparing the integers for sorting since it is already sorted therefore we could implement a single loop to compare the integers to its successor and if a duplicate is encountered, then we could, for instance, use a printf statement to print the duplicates and proceed to iterate the loop n-1 times (which would be 4)- ending the program once that has been done. The best case in this algorithm would be O(N) simply because the performance grows linearly and in direct proportion to the size of the input/ data so if we have a sorted array of size 50 (50 integers in the array) then the iteration would be n-1 (the loop will iterate 50 – 1 times) where n is the length of the array which is 50. The running time in this algorithm increases in direct proportion to the input size. This simply means that in a sorted array, the amount of time the operations take to perform is completely dependent on the input size of the array.
p.s. Sure there are other algorithms efficient and faster but from my knowledge and from what the question asks is for a better big o complexity in the first question and i believe this algorithm achieves that. (correct me if i'm wrong)- thanks :)
You describe three loops, but the first is actually just a condition (If is null or empty abort).
The remaining algo sounds good, except I'd say instead of "current will hold the first index in the array" (which nitpicks would insist is always 0 in C) "current will hold value of first element in the array" or such.
As an aside (although I understand it's a practice assignment) it's so terribly inefficient (I think n^2 is correct). I urge to just have one loop over the array, copying the checked numbers in a sorted structure of some kind and do binary searches in it. (As a teacher I'd have my students describe a balanced tree first so that they can use it here, like a virtual library ;-) ).

Returning duplicates within a given window size in an array?

I saw this question and I have no idea how to do the second part given the time and space constraints:
Given an array of values, design and code an algorithm that returns whether there are two duplicates within k indices of each other? k indices and within plus or minus l (value) of each other? Do all, even the latter, in O(n) running time and O(k) space.
It seems impossible to me to know whether there is a duplicate of a given value within a window size of the value without looking at all values with index difference between k and a[i] but since a[i] might be large, I think that would take O(n^2). Can it be done in O(n)?
This sounds suspiciously like home work. Yes, it can be done in O(n) time and O(k) space. Hint: You need two data structures, and one of them is a hash map.

Resources