What is the time complexity of this algorithmic problem? - artificial-intelligence

*
A search method has time complexity O(n2), where n is the number of states in the space to be
searched. If it takes 1 second to search a space of a thousand states, roughly how long will it take to
search a space of a million states?*
I have found that its approximately 12 days but the way I found is quite wrong i think.
I did 1million^2 / 86400(seconds in a day ) and found 11.56 so approximately 12 days. Is there a better and more efficient solution?

There is not nearly enough information to answer this question. See Big-O description.
O(N^2) means only that the algorithm's execution time will be dominated by an N^2 term. As N grows large, the ratio between two execution times will asymptotically approach the square of their ratios. It says nothing about the execution time for particular values.
Let's keep this simple, assuming a set-up overhead with an array initialization O(N) and some system start-up, a constant. This makes the execution time
t = a * N^2 + b * N + c
for some values of a, b, and c. Even if we know that this is the equation form, we do not have enough information to solve given only one (t, N) data point. We don't know enough to derive t for N= 10^6.
I suspect that whomever posed this problem is looking for the invalid solution, making the unwarranted assumption that N=1000 has already blown all smaller terms to insignificance. In this case, simply scale up by the square of the size ratio:
N1 / N2 = 10^6 / 10^3 = 10^3
Scale up by N^2, or (10^3)^2 = 10^6
That gives you 10^6 seconds, or somewhat over a day; I'll leave the math to you.

Related

Resizing an array by a non-constant, continually

I’d like to perform amortized analysis of a dynamic array:
When we perform a sequence of n insertions, whenever an array of size k fills up, we reallocate an array of size k+sqrt(k), and copy the existing k values into the new array.
I’m new to amortized analysis and this is a problem I have yet to encounter, as we resize the array each time by a different non-constant value. (newSize=prevSize+sqrt(prevSize))
The total cost should be Θ(n*sqrt(n)), thus Θ(sqrt(n)) per operation.
I realize that whenever k >= c^2 for some constant c, then our array grows by c.
Let’s start off with an array of size k=1 (and assume n is large enough, for the sake of this example). After n insertions, we get the following sum of the total cost of the insertions + copies:
1+1(=k)+1+2(=k)+1+3(=k)+1+4(=k)+2+6(=k)+2+8(=k)+2+10(=k)+3+13(=k)+3+16(=k)+4+20(=k)+4+24+4+28+5+33+5+38+6+44+6+50+7…+n
I am seeing the pattern, but I can’t seem to be able to compute the bounds.
I’m trying to use all kinds of amortized analysis methods to bound this aggregated sum.
Let’s consider the accounting method for example, then I thought I needed round([k+sqrt(k)]\sqrt(k)) or simply round(sqrt(k)+1) coins per insertion, but it doesn’t add up.
I’d love to get your help, in trying to properly find the upper and lower sqrt(n) bound.
Thank you very much! :)
The easiest way is to lump together each resize operation with the inserts that follow it before the next resize.
The cost of each is lump is O(k + sqrt(k)), and each lump consists of O(sqrt(k)) operations, so the cost per operations is O( (k + k0.5)/k0.5) = O(k0.5 + 1) = O(k0.5)
Of course you want an answer in terms if n, but since k(n) is in Θ(n), O(k0.5) = O(N0.5).
This can be easily shown using the Accounting Method. Consider an array of size k+sqrt(k) such that the first k entries are occupied and the rest sqrt(k) are empty. Let us make Insert-Last operation draft sqrt(k)+2 coins: One will be used to pay for insertion while the rest (sqrt(k)+1 coins) will be deposited and used for credit. From here, execute Insert-Last sqrt(k) times. We shall then have k+sqrt(k) credit coins: in total we had drafted k+2sqrt(k) coins, sqrt(k) of which we used for paying for the insertions. Hence, we won't need to pay for the resizing of the array. As soon as the array gets full, we would be able to utilize our k+sqrt(k) credit coins and pay for the resizing operation. Since k = Θ(n), each Insert-Last operation drafts sqrt(k)+2 = O(sqrt(k)) = O(sqrt(n)) coins and thus takes O(sqrt(n)) amortized.

dry run of worst case of quick sort

we know that in quick sort worst case is O(n^2)
i can solving the array of:
1 2 3 4 5 6 7 8 9 10
when i put value of n in equation of worst case answer is 100
but in dry run it can solve in 51 steps.
its a big difference what the reason of this
O(n^2) means that the complexity grows with the square of n, not that it is exactly n^2.
You need to check how the cost (ans) grows when n grows. Try putting 5, 10 and 20 items in the worst-case array and then you will see that ans does not grow proportionally (2x each time) to n but much faster.
It would be helpful to consider the definition of Big O when thinking about how it can be applied, in this case, to the worst case scenario in a quick sort algorithm. The Big O describes the asymptotic behavior of functions. When referring to an algorithm, the running time is bounded above by f(x) which is O(f(x)). What this means is that your algorithm cannot grow any faster than f(x). In your example, quick sort is bounded above by (n^2), therefore, it cannot grow any faster than n^2 as n gets arbitrarily large.
Being bounded above by n^2 does not necessarily mean the worst case takes exactly n^2 steps. It is also bounded above by n^4, n^100, n^n. All this means is that quick sort can never grow faster than n^2, n^4, n^100, n^n.
Another point to keep in mind when describing Big O is thinking in terms of n getting arbitrarily large or going towards infinity. In this example n is 10, but when you consider the Big O of larger values of n in the worst-case the number of steps will increase, but will never exceed n^2. I hope this helps!

Why is bubble sort time complexity referred to as n squared? [duplicate]

This question already has answers here:
What is a plain English explanation of "Big O" notation?
(43 answers)
Closed 5 years ago.
There have been other questions about bubble sort time complexity, but this question is different. Everyone says that bubble sort worst case is O(n^2). In the bubble sort after i iterations of the list, the last i elements of the list are in order and don't need to be ever touched or compared again. The time complexity would only be O(n^2) if you needlessly ran over the final elements again and again.
Given that a major feature of the bubble sort is that the elements after (input size minus iteration) never need to be compared again, because it's in its correct place, why is bubble sort time complexity said to be that for something that to me I didn't think was bubble sort? Even in Wikipedia it says the time complexity is O(n^2), and then only halfway into the article it mentions that it can be "optimised" to take only about 50% of the time by not unnecessarily comparing the last i elements.
I was reminded of this because I was making a loop which checked collisions of all my objects in the world, and the pattern was that I checked:
for (int i = 0; i < numberofobjects - 1; i++)
{
{
for (int iplusone = i + 1; iplusone < numberofobjects; iplusone++)
// check collision between i and iplusone
}
}
With 400 objects a time complexity of O(n^2) would be 400 * 400 = 160,000. However it only did 79,800 comparisons, roughly 50%, which is exactly what Wikipedia said. This reminded me of the bubble sort so when I checked I was surprised to see everyone saying it was O(n^2).
Does this mean that whenever someone refers to the bubble sort they're referring to the version that needlessly reiterates over the final elements that have already been sorted? Also when different algorithms are compared bubble sort always fares the worse, but is the writer referring to the obviously bad n^2 version?
With 400 objects a time complexity of O(n^2) would be 400 * 400 = 160,000. However it only did 79,800 comparisons, roughly 50%
Yes you're right about the 79,800 comparisons but you don't get very well big O notation.
First of all if you see carefully bubble sort algorithm you will notice that the exact steps-comparisons are:
n-1 + n-2 + ... + 1 = n(n-1)/2 exactly
This means that with n=400 you get exactly 400*399/2=79,800 comparisons.
Though the big O notation tells you that the total steps are: n(n-1)/2 = n^2/2 - n/2 and in big O notation we ignore lower order terms and constants and we keep only n^2 so it is O(n^2).
What you need to understand here is that big O notation doesn't tell you the exact steps it just tells you an upper bound e.g the higher order of your complexity function, and this is for Big values on n. It simply states that "for big n the complexity-order of growth is c*n^2" - it describes the limiting behavior of a function when the argument tends towards a particular value or infinity .

Sorting algorithm vs. Simple iterations

I'm just getting started in algorithms and sorting, so bear with me...
Let's say I have an array of 50000 integers.
I need to select the smallest 30000 of them.
I thought of two methods :
1. I iterate the entire array and find each smallest integer
2. I first sort the entire array , and then simply select the first 30000.
Can anyone tell me what's the difference, which method would be faster, and why?
What if the array was smaller or bigger? Would the answer change?
Option 1 sounds like the naive solution. It would involve passing through the array to find the smallest item 30000 times. Each time it finds the smallest, presumably it would swap that item to the beginning or end of the array. In basic terms, this is O(n^2) complexity.
The actual number of operations involved would be less than n^2 because n reduces every time. So you would have roughly 50000 + 49999 + 49998 + ... + 20001, which amounts to just over 1 billion (1000 million) iterations.
Option 2 would employ an algorithm like quicksort or similar, which is commonly O(n.logn).
Here it's harder to provide actual figures, because some efficient sorting algorithms can have a worst-case of O(n^2). But let's say you use a well-behaved one that is guaranteed to be O(n.logn). This would amount to 50000 * 15.61 which is about 780 thousand.
So it's clear that Option 2 wins in this case.
What if the array was smaller or bigger? Would the answer change?
Unless the array became trivially small, the answer would still be Option 2. And the larger your array becomes, the more beneficial Option 2 becomes. This is the nature of time complexity. O(n^2) grows much faster than O(n.logn).
A better question to ask is "what if I want fewer smallest values, and when does Option 1 become preferable?". Although the answer is slightly more complex because of numerous factors (such as what constitutes "one operation" in Option 1 vs Option 2, plus other issues like memory access patterns etc), you can get the simple answer directly from time complexity. Option 1 would become preferable when the number of smallest values to select drops below n.logn. In the case of a 50000-element array, that would mean if you want to select 15 or less smallest elements, then Option 1 wins.
Now, consider an Option 3, where you transform the array into a min-heap. Building a heap is O(n), and removing one item from it is O(logn). You are going to remove 30000 items. So you have the cost of building plus the cost of removal: 50000 + 30000 * 15.6 = approximately 520 thousand. And this is ignoring the fact that n gets smaller every time you remove an element. It's still O(n.logn), like Option 2 but it is probably faster: you've saved time by not bothering to sort the elements you don't care about.
I should mention that in all three cases, the result would be the smallest 30000 values in sorted order. There may be other solutions that would give you these values in no particular order.
30k is close to 50k. Just sort the array and get the smallest 30k e.g., in Python: sorted(a)[:30000]. It is O(n * log n) operation.
If you were needed to find 100 smallest items instead (100 << 50k) then a heap might be more suitable e.g., in Python: heapq.nsmallest(100, a). It is O(n * log k).
If the range of integers is limited—you could consider O(n) sorting methods such as counting sort and radix sort.
Simple iterative method is O(n**2) (quadratic) here. Even for a moderate n that is around a million; it leads to ~10**12 operations that is much worse than ~10**6 for a linear algorithm.
For nearly all practical purposes, sorting and taking the first 30,000 is the likely to be best. In most languages, this is one or two lines of code. Hard to get wrong.
If you have a truly demanding application or are just out to fiddle, you can use a selection algorithm to find the 30,000th largest number. Then one more pass through the array will find 29,999 that are no bigger.
There are several well known selection algorithms that require only O(n) comparisons and some that are sub-linear for data with specific properties.
The fastest in practice is QuickSelect, which - as its name implies - works roughly like a partial QuickSort. Unfortunately, if the data happens to be very badly ordered, QuickSelect can require O(n^2) time (just as QuickSort can). There are various tricks for selecting pivots that the make it virtually impossible to get the worst case run time.
QuickSelect will finish with the array reordered so the smallest 30,000 elements are in the first part (unsorted) followed by the rest.
Because standard selection algorithms are comparison-based, they'll work on any kind of comparable data, not just integers.
You can do this in potentially O(N) time with radix sort or counting sort, given that your input is integers.
Another method is to get the 30000th largest integer by quickselect and simply iterate through the original array. This has Θ(N) time complexity, but in the worst case has O(N^2) for quickselect.

Exhaustive searches vs sorting followed by binary search

This is a direct quote from the textbook, Invitation to Computer Science by G. Michael Scneider and Judith L. Gersting.
At the end of Section 3.4.2, we talked about the tradeoff between using sequential search on an unsorted list as opposed to sorting the list and then using binary search. If the list size is n=100,000 about how many worst-case searches must be done before the second alternative is better in terms of number of comparisons?
I don't really get what the question is asking for.
Sequential search is of order (n) and binary is of order (lgn) which in any case lgn will always be less than n. And in this case n is already given so what am I supposed to find.
This is one of my homework assignment but I don't really know what to do. Could anyone explain the question in plain English for me?
and binary is of order (lgn) which in any case lgn will always be less than n
This is where you're wrong. In assignment, you're asked to consider the cost of sorting array too.
Obviously, if you need only one search, first approach is better than sorting array and doing binary search: n < n*logn + logn. And you're asked, how many searches you need for second approach to become more effective.
End of hint.
The question is how to decide which approach to choose - to just use linear search or to sort and then use binary search.
If you only search a couple of times linear search is better - it is O(n), while sorting is already O(n*logn). If you search very often on the same collection sorting is better - searching multiple times can become O(n*n) but sorting and then searching with binary search is again O(n*logn) + NumberOfSearches*O(logn) which can be less or more than using linear search depending on how NumberOfSearches and n relate.
The task is to determine the exact value of NumberOfSearches (not the exact number, but a function of n) which will make one of the options preferable:
NumberOfSearches * O(n) <> O(n*logn) + NumberOfSearches * O(logn)
don't forget that each O() can have a different constant value.
The order of the methods is not important here. It tells you something how well algorithms scale when the problem becomes bigger and bigger. You can't do any exact calculations if you only know O(n) == it complexity grows linear in the size of the problem. It won't give you any numbers.
This can well mean that an algorithm with O(n) complexity is faster than a O(logn) algorithm, for some n. Because O(log(n)) scales better when it gets larger, we know for sure, there is a n (a problem size) where the algorithm with O(logn) complexity is faster. We just don't know when (for what n).
In plain english:
If you want to know 'how many searches', you need exact equations to solve, you need exact numbers. How many comparisons does it take to search sequential? (Remember n is given, so you can give a number.) How many comparisons (in the worst case!) does it take to search with a binary search? Before you can do a binary search, you have to sort. Let's add the number of comparisons needed to sort to the cost of binary search. Now compare the two numbers, which one is less?
The binary search is fast, but the sorting is slow. The sequential search is slower than binary search, but faster than sorting. However the sorting needs to be done only once, no matter how many times you search. So, when does one heavy sort outweigh having to do a slow (sequential) search every time?
Good luck!
For sequential search, the worst case is n = 100000, so for p searches p × 100000 comparisons are required.
Using a Θ(n2) sorting algorithm would require 100000 × 100000 comparisons.
Binary search would require 1 + log n = 1 + log 100000 = 17 comparisons for each search,
together there would be 100000×100000 + 17p comparisons.
The first expression is larger than the second, meaning
100000p > 100000^2 + 17p
For p > 100017.
The question is about appreciating the number NUM_SEARCHES needed to compensate the cost of sorting. So we'll have:
time( NUM_SEARCHES * O(n) ) > time( NUM_SEARCHES * O(log(n)) + O(n* log(n)) )
Thank you guys. I think I get the point now. Could you take a look at my answer and see whether I'm on the right track.
For worst case searches
Number of comparison for sequential search is n = 100,000.
Number of comparison for binary search is lg(n) = 17.
Number of comparison for sorting is (n-1)/2 * n = (99999)(50000).
(I'm following my textbook and used the selection sort algorithm covered in my class)
So let p be the number of worst case searches, then 100,000p > (99999)(50000) + 17p
OR p > 50008
In conclusion, I need 50,008 worst case searches to make sorting and using binary search better than a sequential search for a list of n=100,000.

Resources