efficient algorithms with array of increasing integers - arrays

I've been self teaching myself data structures in python and don't know if I'm overthinking (or underthinking!) the following question:
My goal is come up with an efficient algorithm
With the algorithm, my goal is to determine whether an integer i exists such that A[i] = i in an array of increasing integers
I then want to find the the running time in big-O notation as a function of n the length of A?
so wouldn't this just be a slightly modified version of O(log n) with a function equivalent to: f(i) = A[i] - i. Am I reading this problem wrong? Any help would be greatly appreciated!

Note 1: because you say the integers are increasing, you have ruled out that there are duplicates in the array (otherwise you would say monotonically increasing). So a quick check that would rule out whether there is no solution is if the first element is larger than 1. In other words, for there to be any chance of a solution, first element has to be <= 1.
Note 2: similar to Note 1, if last element is < length of array, then there is no solution.
In general, I think the best you can do is binary search. You trap it between low and high indices, and then check the middle index between low and high. If array[middle] equals middle, return yes. If it is less than middle, then set left to middle+1. Otherwise, set right to middle - 1. If left becomes > right, return no.
Running time is O( log n ).
Edit: algorithm does NOT work if you allow monotonically increasing. Exercise: explain why. :-)

You're correct. Finding an element i in your A sized array is O(Log A) indeed.
However, you can do much better: O(Log A) -> O(1) if you trade memory complexity for time complexity, which is what "optimizers" tend to do.
What I mean is: If you insert new Array elements into an "efficient" hash table you can achieve the find function in constant time O(1)
This is depends a lot on the elements you're inserting:
Are they unique? Think of collisions
How often do you insert?

This is an interesting problem :-)
You can use bisection to locate the place where a[i] == i:
0 1 2 3 4 5 6
a = [-10 -5 2 5 12 20 100]
When i = 3, i < a[i], so bisect down
When i = 1 i > a[i], so bisect up
When i = 2 i == a[i], you found the match
The running time is O(log n).

Related

what would be the time complexity of this algorithm?

i was wondering what would be the time complexity of this piece of code?
last = 0
ans = 0
array = [1,2,3,3,3,4,5,6]
for number in array:
if number != last then: ans++;
last = number
return ans
im thinking O(n^2) as we look at all the array elements twice, once in executing the for loop and then another time when comparing the two subsequent values, but I am not sure if my guess is correct.
While processing each array element, you just make one comparison, based on which you update ans and last. The complexity of the algorithm stands at O(n), and not O(n^2).
The answer is actually O(1) for this case, and I will explain why after explaining why a similar algorithm would be O(n) and not O(n^2).
Take a look at the following example:
def do_something(array):
for number in array:
if number != last then: ans++;
last = number
return ans
We go through each item in the array once, and do two operations with it.
The rule for time complexity is you take the largest component and remove a factor.
if we actually wanted to calculate the exact number of operaitons, you might try something like:
for number in array:
if number != last then: ans++; # n times
last = number # n times
return ans # 1 time
# total number of instructions = 2 * n + 1
Now, Python is a high level language so some of these operations are actually multiple operations put together, so that instruction count is not accurate. Instead, when discussing complexity we just take the largest contributing term (2 * n) and remove the coefficient to get (n). big-O is used when discussing worst case, so we call this O(n).
I think your confused because the algorithm you provided looks at two numbers at a time. the distinction you need to understand is that your code only "looks at 2 numbers at a time, once for each item in the array". It does not look at every possible pair of numbers in the array. Even if your code looked at half of the number of possible pairs, this would still be O(n^2) because the 1/2 term would be excluded.
Consider this code that does, here is an example of an O(n^2) algorithm.
for n1 in array:
for n2 in array:
print(n1 + n2)
In this example, we are looking at each pair of numbers. How many pairs are there? There are n^2 pairs of numbers. Contrast this with your question, we look at each number individually, and compare with last. How many pairs of number and last are there? At worst, 2 * n, which we call O(n).
I hope this clears up why this would be O(n) and not O(n^2). However, as I said at the beginning of my answer this is actually O(1). This is because the length of the array is specifically 8, and not some arbitrary length n. Every time you execute this code it will take the same amount of time, it doesn't vary with anything and so there is no n. n in my example was the length of the array, but there is no such length term provided in your example.

Analysis of Algorithms - Find missing Integer in Sorted Array better than O(n)

I am working through analysis of algorithms class for the first time, and was wondering if anyone could assist with the below example. I believe I have solved it for an O(n) complexity, but was wondering if there is a better version that I am not thinking of O(logn)?
Let A= A[1] <= ... <= A[n+1] be a sorted array of n distinct integers, in which each integer is in the range [1...n+1]. That is, exactly one integer out of {1,...,n+1} is missing from A. Describe an efficeint algorithm to find the missing integer. Analyze the worst case complexity (number of accesses to array A) of your algorithm.
The solution I have is relatively simple, and I believe results in a worst case N complexity. Maybe I am over thinking the example, but is there a better solution?
My Solution
for(i = 1; i < n +1; i++) :
if(A[i-1] > i) :
return i
The logic behind this is since it is sorted, the first element must be 1, the second must be 2, and so on and so forth, until the element in the array is larger than the element it is supposed to be, indiciating an element was missed, return the element it should be and we have the missing one.
Is this correct logic? Is there a better way to go about it?
Thanks for reading and thanks in advance for the assistance.
This logic is certainly correct, but it is not fast enough to beat O(n) because you check every element.
You can do it faster by observing that if A[i]==i, then all elements at j < i are at their proper places. This observation should be sufficient to construct a divide-and-conquer approach that runs in O(log2n):
Check the element in the middle
If it's at the wrong spot, go left
Otherwise, go right
More formally, you are looking for a spot where A[i]==i and A[i+1]==i+2. You start with the interval at the ends of the array. Each probe at the middle of the interval shrinks the remaining interval twofold. At some point you are left with an interval of just two elements. The element on the left is the last "correct" element, while the element on the right is the first element after the missing number.
You can binary search for the first index i with A[i] > i. If the missing integer is k, then A[i] = i for i < k and A[i] = i + 1 for i >= k.
Sometimes the trick is to think of the problem in a different way.
In spirit, you are simply working with an array of boolean values; the entry at index n is the truth of a[n] > n.
Furthermore, this array begins with zero or more consecutive false, and the remaining entries are all true.
Your problem, now, is to find the index of the first instance of true in the (sorted) array of boolean values.

Big-Theta(n) linear sorting algorithm?

Design a linear algorithm to rearrange the elements of a given array of n elements so that all its negative numbers precede any zeroes, and any zeroes precede any positive numbers. It should also be space efficient so that it doesn't require more than a constant amount of additional space.
Everything I am thinking of is much bigger than O(n), and would love some tips/hints/help/java code!
Help? Hint: Quicksort's partition part with pivot as 0. See this Wikipedia article, look for in-place version.
I just realized if you implement teh exact version given in the link above it may not help if you have dupes of zero. My statement is still true that you need to use partition part of Quicksort, but the partition is going to be done by Dutch National Flag problem or three way partitioning. Here is the pseudo code for you
//assume index based 1
A[1..n]
p = 0
q = n+1
i = 1
while i < q
if A[i] < 0
swap(i, ++p)
else if A[i] > 0
swap(i, --q)
else
i++
Time complexity: O(n)
Space complexity: O(1)
Look into using a modified version of Radix Sort, the only sorts that can work in linear time are non-comparison based sorts (so entries in the list/array are not compared to each other) so that's something else to look at (proof involves comparison trees of minimum height as to why a sort that compares items will always be at least nlogn).
If you require only the rearrangement of items according to 3 ranges , negative zero and positive.
An easy solution will be count the number of negative, zeros and positives items with single array iteration (O(n)) (actually you don't need to count the number of positives if you already know the size of the array).
with a second iteration you will swap items (starting from the first one) according to their range to the appropriate index , then increase the index.
That's it, no additional memory and teta(n) time complexity.

Find the minimum number of elements required so that their sum equals or exceeds S

I know this can be done by sorting the array and taking the larger numbers until the required condition is met. That would take at least nlog(n) sorting time.
Is there any improvement over nlog(n).
We can assume all numbers are positive.
Here is an algorithm that is O(n + size(smallest subset) * log(n)). If the smallest subset is much smaller than the array, this will be O(n).
Read http://en.wikipedia.org/wiki/Heap_%28data_structure%29 if my description of the algorithm is unclear (it is light on details, but the details are all there).
Turn the array into a heap arranged such that the biggest element is available in time O(n).
Repeatedly extract the biggest element from the heap until their sum is large enough. This takes O(size(smallest subset) * log(n)).
This is almost certainly the answer they were hoping for, though not getting it shouldn't be a deal breaker.
Edit: Here is another variant that is often faster, but can be slower.
Walk through elements, until the sum of the first few exceeds S. Store current_sum.
Copy those elements into an array.
Heapify that array such that the minimum is easy to find, remember the minimum.
For each remaining element in the main array:
if min(in our heap) < element:
insert element into heap
increase current_sum by element
while S + min(in our heap) < current_sum:
current_sum -= min(in our heap)
remove min from heap
If we get to reject most of the array without manipulating our heap, this can be up to twice as fast as the previous solution. But it is also possible to be slower, such as when the last element in the array happens to be bigger than S.
Assuming the numbers are integers, you can improve upon the usual n lg(n) complexity of sorting because in this case we have the extra information that the values are between 0 and S (for our purposes, integers larger than S are the same as S).
Because the range of values is finite, you can use a non-comparative sorting algorithm such as Pigeonhole Sort or Radix Sort to go below n lg(n).
Note that these methods are dependent on some function of S, so if S gets large enough (and n stays small enough) you may be better off reverting to a comparative sort.
Here is an O(n) expected time solution to the problem. It's somewhat like Moron's idea but we don't throw out the work that our selection algorithm did in each step, and we start trying from an item potentially in the middle rather than using the repeated doubling approach.
Alternatively, It's really just quickselect with a little additional book keeping for the remaining sum.
First, it's clear that if you had the elements in sorted order, you could just pick the largest items first until you exceed the desired sum. Our solution is going to be like that, except we'll try as hard as we can to not to discover ordering information, because sorting is slow.
You want to be able to determine if a given value is the cut off. If we include that value and everything greater than it, we meet or exceed S, but when we remove it, then we are below S, then we are golden.
Here is the psuedo code, I didn't test it for edge cases, but this gets the idea across.
def Solve(arr, s):
# We could get rid of worse case O(n^2) behavior that basically never happens
# by selecting the median here deterministically, but in practice, the constant
# factor on the algorithm will be much worse.
p = random_element(arr)
left_arr, right_arr = partition(arr, p)
# assume p is in neither left_arr nor right_arr
right_sum = sum(right_arr)
if right_sum + p >= s:
if right_sum < s:
# solved it, p forms the cut off
return len(right_arr) + 1
# took too much, at least we eliminated left_arr and p
return Solve(right_arr, s)
else:
# didn't take enough yet, include all elements from and eliminate right_arr and p
return len(right_arr) + 1 + Solve(left_arr, s - right_sum - p)
One improvement (asymptotically) over Theta(nlogn) you can do is to get an O(n log K) time algorithm, where K is the required minimum number of elements.
Thus if K is constant, or say log n, this is better (asymptotically) than sorting. Of course if K is n^epsilon, then this is not better than Theta(n logn).
The way to do this is to use selection algorithms, which can tell you the ith largest element in O(n) time.
Now do a binary search for K, starting with i=1 (the largest) and doubling i etc at each turn.
You find the ith largest, and find the sum of the i largest elements and check if it is greater than S or not.
This way, you would run O(log K) runs of the selection algorithm (which is O(n)) for a total running time of O(n log K).
eliminate numbers < S, if you find some number ==S, then solved
pigeon-hole sort the numbers < S
Sum elements highest to lowest in the sorted order till you exceed S.

Find duplicate entry in array of integers

As a homework question, the following task had been given:
You are given an array with integers between 1 and 1,000,000. One
integer is in the array twice. How can you determine which one? Can
you think of a way to do it using little extra memory.
My solutions so far:
Solution 1
List item
Have a hash table
Iterate through array and store its elements in hash table
As soon as you find an element which is already in hash table, it is
the dup element
Pros
It runs in O(n) time and with only 1 pass
Cons
It uses O(n) extra memory
Solution 2
Sort the array using merge sort (O(nlogn) time)
Parse again and if you see a element twice you got the dup.
Pros
It doesn't use extra memory
Cons
Running time is greater than O(n)
Can you guys think of any better solution?
The question is a little ambiguous; when the request is "which one," does it mean return the value that is duplicated, or the position in the sequence of the duplicated one? If the former, any of the following three solutions will work; if it is the latter, the first is the only that will help.
Solution #1: assumes array is immutable
Build a bitmap; set the nth bit as you iterate through the array. If the bit is already set, you've found a duplicate. It runs on linear time, and will work for any size array.
The bitmap would be created with as many bits as there are possible values in the array. As you iterate through the array, you check the nth bit in the array. If it is set, you've found your duplicate. If it isn't, then set it. (Logic for doing this can be seen in the pseudo-code in this Wikipedia entry on Bit arrays or use the System.Collections.BitArray class.)
Solution #2: assumes array is mutable
Sort the array, and then do a linear search until the current value equals the previous value. Uses the least memory of all. Bonus points for altering the sort algorithm to detect the duplicate during a comparison operation and terminating early.
Solution #3: (assumes array length = 1,000,001)
Sum all of the integers in the array.
From that, subtract the sum of the integers 1 through 1,000,000 inclusive.
What's left will be your duplicated value.
This take almost no extra memory, can be done in one pass if you calculate the sums at the same time.
The disadvantage is that you need to do the entire loop to find the answer.
The advantages are simplicity, and a high probability it will in fact run faster than the other solutions.
Assuming all the numbers from 1 to 1,000,000 are in the array, the sum of all numbers from 1 to 1,000,000 is (1,000,000)*(1,000,000 + 1)/2 = 500,000 * 1,000,001 = 500,000,500,000.
So just add up all the numbers in the array, subtract 500,000,500,000, and you'll be left with the number that occured twice.
O(n) time, and O(1) memory.
If the assumption isn't true, you could try using a Bloom Filter - they can be stored much more compactly than a hash table (since they only store fact of presence), but they do run the risk of false positives. This risk can be bounded though, by our choice of how much memory to spend on the bloom filter.
We can then use the bloom filter to detect potential duplicates in O(n) time and check each candidate in O(n) time.
This python code is a modification of QuickSort:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [i for i in arr if i > pivot]
lesser = [i for i in arr if i < pivot]
if len(greater) + len(lesser) != orig_len - 1:
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
It finds a duplicate in O(n logn)), I think. It uses extra memory on the stack, but it can be rewritten to use only one copy of the original data, I believe:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
The list comprehensions that produce greater and lesser destroy the original with calls to pop(). If arr is not empty after removing greater and lesser from it, then there must be a duplicate and it must be pivot.
The code suffers from the usual stack overflow problems on sorted data, so either a random pivot or an iterative solution which queues the data is necessary:
def findDuplicate(full):
import copy
q = [full]
while len(q):
arr = copy.copy(q.pop(0))
orig_len = len(arr)
if orig_len > 1:
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
q.append(greater)
q.append(lesser)
return None
However, now the code needs to take a deep copy of the data at the top of the loop, changing the memory requirements.
So much for computer science. The naive algorithm clobbers my code in python, probably because of python's sorting algorithm:
def findDuplicate(arr):
arr = sorted(arr)
prev = arr.pop(0)
for element in arr:
if element == prev:
return prev
else:
prev = element
return None
Rather than sorting the array and then checking, I would suggest writing an implementation of a comparison sort function that exits as soon as the dup is found, leading to no extra memory requirement (depending on the algorithm you choose, obviously) and a worst case O(nlogn) time (again, depending on the algorithm), rather than a best (and average, depending...) case O(nlogn) time.
E.g. An implementation of in-place merge sort.
http://en.wikipedia.org/wiki/Merge_sort
Hint: Use the property that A XOR A == 0, and 0 XOR A == A.
As a variant of your solution (2), you can use radix sort. No extra memory, and will run in
linear time. You can argue that time is also affected by the size of numbers representation, but you have already given bounds for that: radix sort runs in time O(k n), where k is the number of digits you can sort ar each pass. That makes the whole algorithm O(7n)for sorting plus O(n) for checking the duplicated number -- which is O(8n)=O(n).
Pros:
No extra memory
O(n)
Cons:
Need eight O(n) passes.
And how about the problem of finding ALL duplicates? Can this be done in less than
O(n ln n) time? (Sort & scan) (If you want to restore the original array, carry along the original index and reorder after the end, which can be done in O(n) time)
def singleton(array):
return reduce(lambda x,y:x^y, array)
Sort integer by sorting them on place they should be. If you get "collision" than you found the correct number.
space complexity O(1) (just same space that can be overwriten)
time complexity less than O(n) becuse you will statisticaly found collison before getting on the end.

Resources