Finding the Average case complexity of an Algorithm - arrays

I have an algorithm for Sequential search of an unsorted array:
SequentialSearch(A[0..n-1],K)
i=0
while i < n and A[i] != K do
i = i+1
if i < n then return i
else return -1
Where we have an input array A[0...n-1] and a search key K
I know that the worst case is n, because we would have to search the entire array, hence n items O(n)
I know that the best case is 1, since that would mean the first item we search is the one we want, or the array has all the same items, either way it's O(1)
But I have no idea on how to calculate the average case. The answer my textbook gives is:
= (p/n)[1+2+...+i+...+n] + n(1-p)
is there a general formula I can follow for when I see an algorithm like this one, to calculate it?
PICTURE BELOW
Textbook example

= (p/n)[1+2+...+i+...+n] + n(1-p)
p here is the probability of an search key found in the array, since we have n elements, we have p/n as the probability of finding the key at the particular index within n . We essentially doing weighted average as in each iteration, we weigh in 1 comparison, 2 comparison, and until n comparison. Because we have to take all inputs into account, the second part n(1-p) tells us the probability of input that doesn't exist in the array 1-p. and it takes n as we search through the entire array.

You'd need to consider the input cases, something like equivalence classes of input, which depends on the context of the algorithm. If none of those things are known, then assuming that the input is an array of random integers, the average case would probably be O(n). This is because, roughly, you have no way of proving to a useful extent how often your query will be found in an array of N integer values in the range of ~-32k to ~32k.

More formally, let X be a discrete random variable denoting the number of elements of the array A that are needed to be scanned. There are n elements and since all positions are equally likely for inputs generated randomly, X ~ Uniform(1,n) where X = 1,..,n, given that search key is found in the array (with probability p), otherwise all the elements need to be scanned, with X=n (with probability 1-p).
Hence, P(X=x)=(1/n).p.I{x<n}+((1/n).p+(1-p)).I{x=n} for x = 1,..,n, where I{x=n} is the indicator function and will have value 1 iff x=n otherwise 0.
Average time complexity of the algorithm is the expected time taken to execute the algorithm when the input is an arbitrary sequence. By definition,
The following figure shows how time taken for searching the array changes with n and p.

Related

what would be the time complexity of this algorithm?

i was wondering what would be the time complexity of this piece of code?
last = 0
ans = 0
array = [1,2,3,3,3,4,5,6]
for number in array:
if number != last then: ans++;
last = number
return ans
im thinking O(n^2) as we look at all the array elements twice, once in executing the for loop and then another time when comparing the two subsequent values, but I am not sure if my guess is correct.
While processing each array element, you just make one comparison, based on which you update ans and last. The complexity of the algorithm stands at O(n), and not O(n^2).
The answer is actually O(1) for this case, and I will explain why after explaining why a similar algorithm would be O(n) and not O(n^2).
Take a look at the following example:
def do_something(array):
for number in array:
if number != last then: ans++;
last = number
return ans
We go through each item in the array once, and do two operations with it.
The rule for time complexity is you take the largest component and remove a factor.
if we actually wanted to calculate the exact number of operaitons, you might try something like:
for number in array:
if number != last then: ans++; # n times
last = number # n times
return ans # 1 time
# total number of instructions = 2 * n + 1
Now, Python is a high level language so some of these operations are actually multiple operations put together, so that instruction count is not accurate. Instead, when discussing complexity we just take the largest contributing term (2 * n) and remove the coefficient to get (n). big-O is used when discussing worst case, so we call this O(n).
I think your confused because the algorithm you provided looks at two numbers at a time. the distinction you need to understand is that your code only "looks at 2 numbers at a time, once for each item in the array". It does not look at every possible pair of numbers in the array. Even if your code looked at half of the number of possible pairs, this would still be O(n^2) because the 1/2 term would be excluded.
Consider this code that does, here is an example of an O(n^2) algorithm.
for n1 in array:
for n2 in array:
print(n1 + n2)
In this example, we are looking at each pair of numbers. How many pairs are there? There are n^2 pairs of numbers. Contrast this with your question, we look at each number individually, and compare with last. How many pairs of number and last are there? At worst, 2 * n, which we call O(n).
I hope this clears up why this would be O(n) and not O(n^2). However, as I said at the beginning of my answer this is actually O(1). This is because the length of the array is specifically 8, and not some arbitrary length n. Every time you execute this code it will take the same amount of time, it doesn't vary with anything and so there is no n. n in my example was the length of the array, but there is no such length term provided in your example.

Find the element occuring once in an array where all other elements occur twice (without using XOR)

I have tried solving this for so long but I can't seem to be able to.
The question is as follows:
Given an array n numbers where all of the numbers in it occur twice except for one, which occurs only once, find the number that occurs only once.
Now, I have found many solutions online for this, but none of them satisfy the additional constraints of the question.
The solution should:
Run in linear time (aka O(n)).
Not use hash tables.
Assume that computer supports only comparison and the arithmetic (addition, subtraction, multiplication, division).
The number of bits in each number in the array is about O(log(n)).
Therefore, trying something like this https://stackoverflow.com/a/4772568/7774315 using the XOR operator isn't possible, since we don't have the XOR operator. Since the number of bits in each number is about O(log(n)), trying to implement the XOR operator using normal arithmetic (bit by bit) will take about O(log(n)) actions, which will give us an overall solution of O(nlog(n)).
The closest I have come to solving it is if I had a way to get the sum of all unique values in the array in linear time, I could subtract twice that sum from the overall sum to get (negative) the element that occurs only once, because if the numbers that appear twice are {a1,a2,....,ak} and the number that appears once is x, then the overall sum is
sum=2(a1+...+ak)+x
As far as I know, sets are implemented using hash tables, so using them to find the sum of all unique values is no good.
Let's imagine we had a way to find the exact median in linear time and partition the array so all greater elements are on one side and smaller elements on the other. By the parity of expected number of elements, we could identify which side the target element is in. Now perform this routine recursively in the section we identified. Since the section is halved in size each time, the total number of elements traversed cannot exceed O(2n) = O(n).
The key element in the question seems to be this one:
The number of bits in each number in the array is about O(log(n)).
The issue is that this clue is vague a little bit.
A first approach is to consider that the maximum value is O(n). Then a counting sort can be performed in O(n) operations and O(n) memory.
It will consists in finding the maximum value MAX, setting an integer array C[MAX] and performing directly a classical counting sort thanks to it
C[a[i]]++;
Looking for an odd value in array C[] will provide the solution.
A second approach, I guess more efficient, would be to set an array of size n, each element consisting of an array of unknown size. Then, a kind of almost counting sort would consists in :
C[a[i]%n].append (a[i]);
To find the unique element, we then have to find a sub-array of odd size, and then to examine the elements in this sub-array.
The maximum size k of each sub-array will be about 2*(MAX/n). According to the clue, this value should be very low. Dealing with this sub-array has a complexity O(k), for example by performing a counting sort on the b[j]/n, all the elements being equal modulo n.
We can note that practically, this is equivalent to perform a kind of ad-hoc hashing.
Global complexity is O(n + MAX/n).
This should do the trick as long as your a dealing with integers of size O(log n). It is a Python implementation of the algorithm sketched #גלעד ברקן answer (including #OneLyner comments), where the median is replaced by a mean or mid-value.
def mean(items):
result = 0
for i, item in enumerate(items, 1):
result = (result * (i - 1) + item) / i
return result
def midval(items):
min_val = max_val = items[0]
for item in items:
if item < min_val:
min_val = item
elif item > max_val:
max_val = item
return (max_val - min_val) / 2
def find_singleton(items, pivoting=mean):
n = len(items)
if n == 1:
return items[0]
else:
# find pivot - O(n)
pivot = pivoting(items)
# partition the items - O(n)
j = 0
for i, item in enumerate(items):
if item > pivot:
items[j], items[i] = items[i], items[j]
j += 1
# recursion on the partition with odd number of elements
if j % 2:
return find_singleton(items[:j])
else:
return find_singleton(items[j:])
The following code is just for some sanity-checking on random inputs:
def gen_input(n, randomize=True):
"""Generate inputs with unique pairs except one, with size (2 * n + 1)."""
items = sorted(set(random.randint(-n, n) for _ in range(n)))[:n]
singleton = items[-1]
items = items + items[:-1]
if randomize:
random.shuffle(items)
return items, singleton
items, singleton = gen_input(100)
print(singleton, len(items), items.index(singleton), items)
print(find_singleton(items, mean))
print(find_singleton(items, midval))
For a symmetric distribution the median and the mean or mid-value coincide.
With the log(n) requirement on the number of bits for the entries, one
can show that any arbitrary sub-sampling cannot be skewed enough to provide more than log(n) recursions.
For example, considering the case of k = log(n) bits with k = 4 and only positive numbers, the worst case is: [0, 1, 1, 2, 2, 4, 4, 8, 8, 16, 16]. Here pivoting by the mean will reduce the input by 2 at time, resulting in k + 1 recursive calls, but adding any other couple to the input will not increase the number of recursive calls, while it will increase the input size.
(EDITED to provide a better explanation.)
Here is an (unoptimized) implementation of the idea sketched by גלעד ברקן .
I'm using Median_of_medians to get a value close enough to the median to ensure the linear time in the worst case.
NB: this in fact uses only comparisons, and is O(n) whatever the size of the integers as long as comparisons and copies are counted as O(1).
def median_small(L):
return sorted(L)[len(L)//2]
def median_of_medians(L):
if len(L) < 20:
return median_small(L)
return median_of_medians([median_small(L[i:i+5]) for i in range(0, len(L), 5)])
def find_single(L):
if len(L) == 1:
return L[0]
pivot = median_of_medians(L)
smaller = [i for i in L if i <= pivot]
bigger = [i for i in L if i > pivot]
if len(smaller) % 2:
return find_single(smaller)
else:
return find_single(bigger)
This version needs O(n) additional space, but could be implemented with O(1).

Does the array “sum and/or sub” to `x`?

Goal
I would like to write an algorithm (in C) which returns TRUE or FALSE (1 or 0) depending whether the array A given in input can “sum and/or sub” to x (see below for clarification). Note that all values of A are integers bounded between [1,x-1] that were randomly (uniformly) sampled.
Clarification and examples
By “sum and/or sub”, I mean placing "+" and "-" in front of each element of array and summing over. Let's call this function SumSub.
int SumSub (int* A,int x)
{
...
}
SumSub({2,7,5},10)
should return TRUE as 7-2+5=10. You will note that the first element of A can also be taken as negative so that the order of elements in A does not matter.
SumSub({2,7,5,2},10)
should return FALSE as there is no way to “sum and/or sub” the elements of array to reach the value of x. Please note, this means that all elements of A must be used.
Complexity
Let n be the length of A. Complexity of the problem is of order O(2^n) if one has to explore all possible combinations of pluses and minus. However, some combinations are more likely than others and therefore are worth being explored first (hoping the output will be TRUE). Typically, the combination which requires substracting all elements from the largest number is impossible (as all elements of A are lower than x). Also, if n>x, it makes no sense to try adding all the elements of A.
Question
How should I go about writing this function?
Unfortunately your problem can be reduced to subset-sum problem which is NP-Complete. Thus the exponential solution can't be avoided.
The original problem's solution is indeed exponential as you said. BUT with the given range[1,x-1] for numbers in A[] you can make the solution polynomial. There is a very simple dynamic programming solution.
With the order:
Time Complexity: O(n^2*x)
Memory Complexity: O(n^2*x)
where, n=num of elements in A[]
You need to use dynamic programming approach for this
You know the min,max range that can be made in in the range [-nx,nx]. Create a 2d array of size (n)X(2*n*x+1). Lets call this dp[][]
dp[i][j] = taking all elements of A[] from [0..i-1] whether its possible to make the value j
so
dp[10][3] = 1 means taking first 10 elements of A[] we CAN create the value 3
dp[10][3] = 0 means taking first 10 elements of A[] we can NOT create the value 3
Here is a kind of pseudo code for this:
int SumSub (int* A,int x)
{
bool dp[][];//set all values of this array 0
dp[0][0] = true;
for(i=1;i<=n;i++) {
int val = A[i-1];
for(j=-n*x;j<=n*x;j++) {
dp[i][j]=dp[ i-1 ][ j + val ] | dp[ i-1 ][ j - val ];
}
}
return dp[n][x];
}
Unfortunately this is NP-complete even when x is restricted to the value 0, so don't expect a polynomial-time algorithm. To show this I'll give a simple reduction from the NP-hard Partition Problem, which asks whether a given multiset of positive integers can be partitioned into two parts having equal sums:
Suppose we have an instance of the Partition Problem consisting of n positive integers B_1, ..., B_n. Create from this an instance of your problem in which A_i = B_i for each 1 <= i <= n, and set x = 0.
Clearly if there is a partition of B into two parts C and D having equal sums, then there is also a solution to the instance of your problem: Put a + in front of every number in C, and a - in front of every number in D (or the other way round). Since C and D have equal sums, this expression must equal 0.
OTOH, if the solution to the instance of your problem that we just created is YES (TRUE), then we can easily create a partition of B into two parts having equal sums: just put all the positive terms in one part (say, C), and all the negative terms (without the preceding - of course) in the other (say, D). Since we know that the total value of the expression is 0, it must be that the sum of the (positive) numbers in C is equal to the (negated) sum of the numbers in D.
Thus a YES to either problem instance implies a YES to the other problem instance, which in turn implies that a NO to either problem instance implies a NO to the other problem instance -- that is, the two problem instances have equal solutions. Thus if it were possible to solve your problem in polynomial time, it would be possible to solve the NP-hard Partition Problem in polynomial time too, by constructing the above instance of your problem, solving it with your poly-time algorithm, and reporting the result it gives.

Find the minimum number of elements required so that their sum equals or exceeds S

I know this can be done by sorting the array and taking the larger numbers until the required condition is met. That would take at least nlog(n) sorting time.
Is there any improvement over nlog(n).
We can assume all numbers are positive.
Here is an algorithm that is O(n + size(smallest subset) * log(n)). If the smallest subset is much smaller than the array, this will be O(n).
Read http://en.wikipedia.org/wiki/Heap_%28data_structure%29 if my description of the algorithm is unclear (it is light on details, but the details are all there).
Turn the array into a heap arranged such that the biggest element is available in time O(n).
Repeatedly extract the biggest element from the heap until their sum is large enough. This takes O(size(smallest subset) * log(n)).
This is almost certainly the answer they were hoping for, though not getting it shouldn't be a deal breaker.
Edit: Here is another variant that is often faster, but can be slower.
Walk through elements, until the sum of the first few exceeds S. Store current_sum.
Copy those elements into an array.
Heapify that array such that the minimum is easy to find, remember the minimum.
For each remaining element in the main array:
if min(in our heap) < element:
insert element into heap
increase current_sum by element
while S + min(in our heap) < current_sum:
current_sum -= min(in our heap)
remove min from heap
If we get to reject most of the array without manipulating our heap, this can be up to twice as fast as the previous solution. But it is also possible to be slower, such as when the last element in the array happens to be bigger than S.
Assuming the numbers are integers, you can improve upon the usual n lg(n) complexity of sorting because in this case we have the extra information that the values are between 0 and S (for our purposes, integers larger than S are the same as S).
Because the range of values is finite, you can use a non-comparative sorting algorithm such as Pigeonhole Sort or Radix Sort to go below n lg(n).
Note that these methods are dependent on some function of S, so if S gets large enough (and n stays small enough) you may be better off reverting to a comparative sort.
Here is an O(n) expected time solution to the problem. It's somewhat like Moron's idea but we don't throw out the work that our selection algorithm did in each step, and we start trying from an item potentially in the middle rather than using the repeated doubling approach.
Alternatively, It's really just quickselect with a little additional book keeping for the remaining sum.
First, it's clear that if you had the elements in sorted order, you could just pick the largest items first until you exceed the desired sum. Our solution is going to be like that, except we'll try as hard as we can to not to discover ordering information, because sorting is slow.
You want to be able to determine if a given value is the cut off. If we include that value and everything greater than it, we meet or exceed S, but when we remove it, then we are below S, then we are golden.
Here is the psuedo code, I didn't test it for edge cases, but this gets the idea across.
def Solve(arr, s):
# We could get rid of worse case O(n^2) behavior that basically never happens
# by selecting the median here deterministically, but in practice, the constant
# factor on the algorithm will be much worse.
p = random_element(arr)
left_arr, right_arr = partition(arr, p)
# assume p is in neither left_arr nor right_arr
right_sum = sum(right_arr)
if right_sum + p >= s:
if right_sum < s:
# solved it, p forms the cut off
return len(right_arr) + 1
# took too much, at least we eliminated left_arr and p
return Solve(right_arr, s)
else:
# didn't take enough yet, include all elements from and eliminate right_arr and p
return len(right_arr) + 1 + Solve(left_arr, s - right_sum - p)
One improvement (asymptotically) over Theta(nlogn) you can do is to get an O(n log K) time algorithm, where K is the required minimum number of elements.
Thus if K is constant, or say log n, this is better (asymptotically) than sorting. Of course if K is n^epsilon, then this is not better than Theta(n logn).
The way to do this is to use selection algorithms, which can tell you the ith largest element in O(n) time.
Now do a binary search for K, starting with i=1 (the largest) and doubling i etc at each turn.
You find the ith largest, and find the sum of the i largest elements and check if it is greater than S or not.
This way, you would run O(log K) runs of the selection algorithm (which is O(n)) for a total running time of O(n log K).
eliminate numbers < S, if you find some number ==S, then solved
pigeon-hole sort the numbers < S
Sum elements highest to lowest in the sorted order till you exceed S.

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Resources