I sometimes get confused with the time complexity analysis for the code that includes arrays.
For example:
ans = [0] * n
for x in range(1, n):
ans[x] = ans[x-1] + 1
I thought the for-loop had a time complexity of O(n^2) because it accesses elements in the array with n elements, and it repeats the same thing for n times.
However, I've seen some explanations saying it takes just O(n); thus, my question is: when we analyze the time complexity of a program that accesses elements in an array (not necessarily the first or the last element), should we include the time to access those array elements, or is it often ignored?
Indexed access is usually a constant-time operation, due to the availability of random access memory in most practical cases. If you were to run this e.g. in Python and measure the time it takes for different values of n, you will find that this is the case.
Therefore, your code only performs one loop from 1 to n and all other operations are constant-time, so you get a time complexity of O(n).
Your thinking is otherwise right - if this was a linked list and you had to iterate through it to find your value, then it would be O(n2).
time complexity
Big-O cheat sheet
Related
I am learning about BIG-O and got confused below:
int arr=[1,2,3,4,5]
-- simple print
print(arr[0])
print(arr[1])
print(arr[2])
print(arr[3])
print(arr[4])
-- loop - BIG-O O(n)
for i in length(arr) {
print( arr[i] )
}
would the simple print also give me O(n) or O(1) ?
Big-O is all about loops and growth.
If you have a fixed n then it is O(1) to print them — it will always take the same amount of time (n=5) to print n=5 items.
But for some variable n, it takes more time the larger n gets, so it becomes O(n).
If you are talking about storage, then an array of n items is O(n), even for fixed n=5.
Context matters. This context was me being stupid. O(5) is O(1).
Your print statement is constant (O(1)), because you are accessing a value by index, and indexed access of an array is typically a constant-time operation.
Your loop is O(n), as you guessed. So your code in total would also be O(n), since the loop performs a constant action.
As a disclaimer, this can get more complex depending on our computation model and assumptions. For instance, not all computation model assume constant-time random access, and you could argue that printing a number also depends on the size of that number and turns out to be O(log n). But I think that in the scope of your question, you don't need to worry about this.
I want to sort an array, A, under this cost model:
For any value x, an assignment of the form A[i] = x has a cost of 1. Furthmore, A[i] = A[j] has a cost of 1.
Other operations, such as comparing and assignments of the for x = A[i] (where x is not a location in the array) has a cost of 0.
Questions:
Give a lower bound on the worst-case time required to sort the array A. Your answer should be an exact expression in terms of n and not use asymptotic notation.
Describe a sorting algorithm that uses O(n) space. The runtime should exactly match the lower bound given in 1 (exactly, not asymptotically).
Describe an in-place sorting algorithm that is optimal to this cost model. The runtime should exactly match the bound given in 1 (exactly, not asymptotically).
My attempts:
n. This is because, in the worst case, n elements of the array are in an index they are not supposed to be in. Therefore it will take n assignments to get the array in a sorted order.
My algorithm in psudo code:
def weird_sort(A):
B = an array the same size of A
C = an array of bools (default True) the same size of A
for i in range(0, A.size):
min = first index in c that is True
for j in range(0, A.size):
if (A[j] < A[min]) and (C[j]):
min = j
B[i] = A[min]
C[i] = False
A = B
I believe this takes exactly n time to run since the only time we are assigning anything into A is in the last line, where we copy the contents of B into A.
No idea where to start. It appears to me that in order to keep everything in place we have to swap things in array A, but I can't figure out how to go about how to sort an array with n/2 swaps. Can someone get me moving in the right direction? Can you also scrutinize my answer for 1 and 2?
I consider inplace to allow O(1) additional variables since otherwise I don't think it's possible
First lets solve subproblem: Given i, find the number which should be on i-th position. It's possible to do using bruteforce since comparing is free.
Now copy 1st element(to additional variable), find smallest element and put it in position 1. Now this element was in position i. Let's find element which should be on position i and copy it here (suppose it was on position j), now find element which belongs to position j, etc. Eventually we found element we initially copied, put it back. So, we set k variables to their places using k assignments (in a cycle structure).
Now do the same for all other elements. You can't remember for each variable whether you put in its place, but you can check if it is on its place for free.
If there are equal elements in A this should be done more carefully but it should still work
Although when talking about efficient sorting algorithms, we usually tend to talk about quicksort, this kind of algorithms are optimized the number of comparisons.
Other algorithms, however, try to optimize the number of memory accesses instead (as in your case). Some of them are called cache-oblivious algorithms (don't make assumptions on the specific memory hierarchy parameters) and cache-aware (they are tuned for a specific memory hierarchy). You can find multiple algorithms of this kind, so you might be interested in giving them a look.
As an example, Harald Prokop's PhD thesis talks about cache-oblivious algorithms, and proposes the distribution sort, which partially sorts the data in sub groups that potentially fit in the lower partitions of the memory hierarchy.
Distribution sort uses O(n ln(n)) work and incurs O(1+ (L/n)*(1+logz(n)) cache misses to sort n elements.
where L is size of a cache bank and z the size of the cache itself. The performance model only assumes that there is only a single cache level, although it adapts to all the cache levels thanks to the oblivious property.
The fundamental concept is that the assignment cost changes depending on where an element is placed in the memory hierarchy.
I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?
The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)
I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).
One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?
Supposed an array is initially empty with a size 5, and it expands by 5 everytime all slots are filled.
I understand that if we are only considering any sequence of n append() operations, the amortized cost would be O(n) because the total cost would be:
5+(5+1*5)+(5+2*5)+...+(5+(floor(n/5)-1)*5) = O(n^2).
*where floor(n/5) is the number of array expansions.
However, what if it's any sequence of n operations contains pop() as well? Assume pop() doesn't change array size.
My way obviously wouldn't work and I have read the CLRS but am still quite stucked. Any help would be appreciated.
The answer, somewhat disappointingly, is if your sequence contains s many push or pop operations then the amortized cost of each operation is O(s).
To cherry-pick a line from another question's very good answer:
Amortized analysis gives the average performance (over time) of each operation in the worst case.
The worst case is pretty clear: repeated pushes. In which case, your original analysis stands.
I am having trouble on an assignment regarding running time.
The problem statement is:
"Isabel has an interesting way of summing up the values in an array A of n integers, where n is a power of two. She creates an array B of half the size of A, and sets B[i] = A[2i] + A[2i+1], for i=0,1,…,(n/2)-1. If B has size 1, then she outputs B[0]. Otherwise, she replaces A with B, and repeats the process. What is the running time of her algorithm?"
Would this be considered a O(log n) or a O(n)? I am thinking O(log n) because you would keep on dividing the array in half until you get the final result and I believe the basis of O(log n) is that you do not traverse the entire data structure. However in order to compute the sum, you have to access each element within the array thus making me think that it could possibly be O(n). Any help in understanding this would be greatly appreciated.
I believe the basis of O(log n) is that you do not traverse the entire
data structure.
There's no basis for beliefs or guesses. Run through the algorithm mentally.
How many recursions are there going to be for array A of size n?
How many summations are there going to be for each recursion (when array A is of size n)?
First run: n/2 summations, n accesses to elements of A
.
.
.
Last run: 1 summation, 2 accesses to elements of A
How many runs are there total? When you sum this up, what is the highest power of n?
As you figured out yourself, you do need to access all elements to compute the sum. So your proposition:
I believe the basis of O(log n) is that you do not traverse the entire data structure
does not hold. You can safely disregard the possibility of the algorithm being O(log n) then.
As for being O(n) or something different, you need to think about how many operations will be done as a whole. George Skoptsov's answer gives a good hint at that. I'd just like to call attention to a fact (from my own experience) that to determine "the running time" you need to take everything into account: memory access, operations, input and output, etc. In your simple case, only looking at the accesses (or the number of sums) might be enough, but in practice you can have very skewed results if you don't look at the problem from every angle.