Running Time of built in functions - arrays

if I use built in functions from java do I have to take into consideration there running time or should I count them as constant time. what will be the time complexity of the following function
def int findMax(int [] a)
{
a.Arrays.sort();
n=a.length;
return a[n-1];
}

Nearly all the work here is being done by the sort() algorithm. For sorting arrays, Java uses Quicksort, which has an O(n log n) average and O(n^2) worst case performance.
From the Java documentation on sort():
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
Given your question lacks any use case whatsoever, and your comment on my answer, I feel like I have to point out that this is a classic example of premature optimization. You're looking for the mathematical complexity of a trivial method without any indication that this method will account for any significant portion of the CPU-time used by your program. This is especially true given that your implementation is incredibly inefficient: iterating through the array and storing the highest value would execute in O(n).

Related

Why bubble sort is not efficient?

I am developing backend project using node.js and going to implement sorting products functionality.
I researched some articles and there were several articles saying bubble sort is not efficient.
Bubble sort was used in my previous projects and I was surprised why it is bad.
Could anyone explain about why it is inefficient?
If you can explain by c programming or assembler commands it would be much appreciated.
Bubble Sort has O(N^2) time complexity so it's garbage for large arrays compared to O(N log N) sorts.
In JS, if possible use built-in sort functions that the JS runtime might be able to handle with pre-compiled custom code, instead of having to JIT-compile your sort function. The standard library sort should (usually?) be well-tuned for the JS interpreter / JIT to handle efficiently, and use an efficient implementation of an efficient algorithm.
The rest of this answer is assuming a use-case like sorting an array of integers in an ahead-of-time compiled language like C compiled to native asm. Not much changes if you're sorting an array of structs with one member as the key, although cost of compare vs. swap can vary if you're sorting char* strings vs. large structs containing an int. (Bubble Sort is bad for any of these cases with all that swapping.)
See Bubble Sort: An Archaeological Algorithmic Analysis for more about why it's "popular" (or widely taught / discussed) despite being one the worst O(N^2) sorts, including some accidents of history / pedagogy. Also including an interesting quantitative analysis of whether it's actually (as sometimes claimed) one of the easiest to write or understand using a couple code metrics.
For small problems where a simple O(N^2) sort is a reasonable choice (e.g. the N <= 32 element base case of a Quick Sort or Merge Sort), Insertion Sort is often used because it has good best-case performance (one quick pass in the already-sorted case, and efficient in almost-sorted cases).
A Bubble Sort (with an early-out for a pass that didn't do any swaps) is also not horrible in some almost-sorted cases but is worse than Insertion Sort. But an element can only move toward the front of the list one step per pass, so if the smallest element is near the end but otherwise fully sorted, it still takes Bubble Sort O(N^2) work. Wikipedia explains Rabbits and turtles.
Insertion Sort doesn't have this problem: a small element near the end will get inserted (by copying earlier elements to open up a gap) efficiently once it's reached. (And reaching it only requires comparing already-sorted elements to determine that and move on with zero actual insertion work). A large element near the start will end up moving upwards quickly, with only slightly more work: each new element to be examined will have to be inserted before that large element, after all others. So that's two compares and effectively a swap, unlike the one swap per step Bubble Sort would do in it's "good" direction. Still, Insertion Sort's bad direction is vastly better than Bubble Sort's "bad" direction.
Fun fact: state of the art for small-array sorting on real CPUs can include SIMD Network Sorts using packed min/max instructions, and vector shuffles to do multiple "comparators" in parallel.
Why Bubble Sort is bad on real CPUs:
The pattern of swapping is probably more random than Insertion Sort, and less predictable for CPU branch predictors. Thus leading to more branch mispredicts than Insertion Sort.
I haven't tested this myself, but think about how Insertion Sort moves data: each full run of the inner loop moves a group of elements to the right to open up a gap for a new element. The size of that group might stay fairly constant across outer-loop iterations so there's a reasonable chance of predicting the pattern of the loop branch in that inner loop.
But Bubble Sort doesn't do so much creation of partially-sorted groups; the pattern of swapping is unlikely to repeat1.
I searched for support for this guess I just made up, and did find some: Insertion sort better than Bubble sort? quotes Wikipedia:
Bubble sort also interacts poorly with modern CPU hardware. It produces at least twice as many writes as insertion sort, twice as many cache misses, and asymptotically more branch mispredictions.
(IDK if that "number of writes" was naive analysis based on the source, or looking at decently optimized asm):
That brings up another point: Bubble Sort can very easily compile into inefficient code. The notional implementation of swapping actually stores into memory, then re-reads that element it just wrote. Depending on how smart your compiler is, this might actually happen in the asm instead of reusing that value in a register in the next loop iteration. In that case, you'd have store-forwarding latency inside the inner loop, creating a loop-carried dependency chain. And also creating a potential bottleneck on cache read ports / load instruction throughput.
Footnote 1: Unless you're sorting the same tiny array repeatedly; I tried that once on my Skylake CPU with a simplified x86 asm implementation of Bubble Sort I wrote for this code golf question (the code-golf version is intentionally horrible for performance, optimized only for machine-code size; IIRC the version I benchmarked avoided store-forwarding stalls and locked instructions like xchg mem,reg).
I found that with the same input data every time (copied with a few SIMD instructions in a repeat loop), the IT-TAGE branch predictors in Skylake "learned" the whole pattern of branching for a specific ~13-element Bubble Sort, leading to perf stat reporting under 1% branch mispredicts, IIRC. So it didn't demonstrate the tons of mispredicts I was expecting from Bubble Sort after all, until I increased the array size some. :P
Bubble sort runs in O(n^2) time complexity. Merge sort takes O(n*log(n)) time, while quick sort takes O(n*log(n)) time on average, thus performing better than bubble sort.
Refer to this: complexity of bubble sort.

What is good measure to compare algorithms?

Well I was reading an article about comparing two algorithms by firstly analyzing them.
My teacher taught me that you can analyze algorithm by directly using number of steps for that algorithm.
for ex:
algo printArray(arr[n]){
for(int i=0;i<n;i++){
write arr[i];
}
}
will have complexity of O(N), where N is size of array. and it repeats the for loop for N times.
while
algo printMatrix(arr,m,n){
for(i=0;i<m;i++){
for(j=0;j<n;j++){
write arr[i][j];
}
}
}
will have complexity of O(MXN) ~ O(N^2) when M=N. statements inside for are executed MXN times.
similarly O(log N). if it divides input into 2 equal parts. and So on.
But according to that article:
The Measures Execution Time, Number of statements aren't good for analyzing the algorithm.
because:
Execution Time will be system Dependent and,
Number of statements will vary with the programming language used.
and It states that
Ideal Solution will be to express running time of algorithm as a function of input size N that is f(n).
That confused me a little, How can you calculate running time if you consider execution time as not good measure?
Can experts here please elaborate this?
Thanks in advance.
When you were saying "complexity of O(N)" that is referred to as "Big-O notation" which is the same as the "Ideal Solution" that you mentioned in your post. It is a way of expressing run time as a function of input size.
I think were you got confused was when it said "express running time" - it didn't mean express it in a numerical value (which is what execution time is), it meant express it in Big-O notation. I think you just got tripped up on the terminology.
Execution time is indeed system-dependent, but it also depends on the number of instructions the algorithm executes.
Also, I do not understand how the number of steps is irrelevant, given that algorithms are analyzed as language-agnostic and without paying any attention to whatever features and syntactic-sugars various languages imply.
The one measure of algorithm analysis I have always encountered since I started analyzing algorithms is the number of executed instructions and I fail to see how this metric may be irrelevant.
At the same time, complexity classes are meant as an "order of magnitude" indication of how fast or slow an algorithm is. They are dependent of the number of executed instructions and independent of the system the algorithm runs on, because by definition an elementary operation (such as addition of two numbers) should take constant time, however large or small this "constant" means in practice, therefore complexity classes do not change. The constants inside the expression for the exact complexity function may indeed vary from system to system, but what is actually relevant for algorithm comparison is the complexity class, as only by comparing those can you find out how an algorithm behaves on increasingly large inputs (asymptotically) compared to another algorithm.
Big-O notation waves away constants (both fixed cost and constant multipliers). So any function that takes kn+c operations to complete is (by definition!) O(n), regardless of k and c. This is why it's often better to take real-world measurements (profiling) of your algorithms in action with real data, to see how fast they effectively are.
But execution time, obviously, varies depending on the data set -- if you're trying to come up with a general measure of performance that's not based on a specific usage scenario, then execution time is less valuable (unless you're comparing all algorithms under the same conditions, and even then it's not necessarily fair unless you model the majority of possible scenarios, and not just one).
Big-O notation becomes more valuable as you move to larger data sets. It gives you a rough idea of the performance of an algorithm, assuming reasonable values for k and c. If you have a million numbers you want to sort, then it's safe to say you want to stay away from any O(n^2) algorithm, and try to find a better O(n lg n) algorithm. If you're sorting three numbers, the theoretical complexity bound doesn't matter anymore, because the constants dominate the resources taken.
Note also that while the number of statements a given algorithm can be expressed in varies wildly between programming languages, the number of constant-time steps that need to be executed (at the machine level for your target architecture, which is typically one where integer arithmetic and memory accesses take a fixed amount of time, or more precisely are bounded by a fixed amount of time). It is this bound on the maximum number of fixed-cost steps required by an algorithm that big-O measures, which has no direct relation to actual running time for a given input, yet still describes roughly how much work must be done for a given data set as the size of the set grows.
In comparing algorithms, execution speed is important as well mentioned by others, but other factors like memory space are crucial too.
Memory space also uses order of complexity notation.
Code could sort an array in place using a bubble sort needing only a handful of extra memory O(1). Other methods, though faster, may need O(ln N) memory.
Other more esoteric measures include code complexity like Cyclomatic complexity and Readability
Traditionally, computer science measures algorithm effectivity (speed) by the number of comparisons or sometimes data accesses, using "Big O notation". This is so, because the number of comparisons (and/or data accesses) is a good mathematical model to describe efficiency of certain algorithms, searching and sorting ones in particular, where O(log n) is considered the fastest possible in theory.
This theoretic model has always had several flaws though. It assumes that comparisons (and/or data accessing) are what takes time, and that the time for performing things like function calls and branching/looping is neglectible. This is of course nonsense in the real world.
In the real world, a recursive binary search algorithm might for example be extremely slow compared to a quick & dirty linear search implemented with a plain for loop, because on the given system, the function call overhead is what takes the most time, not the comparisons.
There are a whole lot of things that affect performance. As CPUs evolve, more such things are invented. Nowadays, you might have to consider things like data alignment, instruction pipe-lining, branch prediction, data cache memory, multiple CPU cores and so on. All these technologies make traditional algorithm theory rather irrelevant.
To write the most effective code possible, you need to have a specific system in mind and you need in-depth knowledge about said system. Fortunately, compilers have evolved a lot too, so a lot of the in-depth system knowledge can be left to the person who implements a compiler port for the specific system.
Generally, I think many programmers today spend far too much time pondering about program speed and coming up with "clever things" to get better performance. Back in the days when CPUs were slow and compilers were terrible, such things were very important. But today, a good, modern programmer focus on making the code bug-free, readable, maintainable, re-useable, secure, portable etc. It doesn't matter how fast your program is, if it is a buggy mess of unreadable crap. So deal with performance when the need arises.

In which cases do we use heapsort?

In which cases heap sort can be used? As we know, heap sort has a complexity of n×lg(n). But it's used far less often than quick and merge sort. So when do we use this heap sort exactly and what are its drawbacks?
Characteristics of Heapsort
O(nlogn) time best, average, worst case performance
O(1) extra memory
Where to use it?
Guaranteed O(nlogn) performance. When you don't necessarily need very fast performance, but guaranteed O(nlogn) performance (e.g. in a game), because Quicksort's O(n^2) can be painfully slow. Why not use Mergesort then? Because it takes O(n) extra memory.
To avoid Quicksort's worst case. C++'s std::sort routine generally uses a varation of Quicksort called Introsort, which uses Heapsort to sort the current partition if the Quicksort recursion goes too deep, indicating that a worst case has occurred.
Partially sorted array even if stopped abruptly. We get a partially sorted array if Heapsort is somehow stopped abruptly. Might be useful, who knows?
Disadvantages
Relatively slow as compared to Quicksort
Cache inefficient
Not stable
Not really adaptive (Doesn't get faster if given somewhat sorted array)
Based on the wikipedia article for sorting algorithms, it appears that the Heapsort and Mergesort all have identical time complexity O(n log n) for best, average and worst case.
Quicksort has a disadvantage there as its worst case time complexity of O(n2) (a).
Mergesort has the disadvantage that its memory complexity is O(n) whereas Heapsort is O(1). On the other hand, Mergesort is a stable sort and Heapsort is not.
So, based on that, I would choose Heapsort in preference to Mergesort if I didn't care about the stability of the sort, so as to minimise memory usage. If stability was required, I would choose MergeSort.
Or, more correctly, if I had huge amounts of data to sort, and I had to code my own algorithms to do it, I'd do that. For the vast majority of cases, the difference between the two is irrelevant, until your data sets get massive.
In fact, I've even used bubble sort in real production environments where no other sort was provided, because:
it's incredibly easy to write (even the optimised version);
it's more than efficient enough if the data has certain properties (either small datsets or datasets that were already mostly sorted before you added a couple of items).
Like goto and multiple return points, even seemingly bad algorithms have their place :-)
(a) And, before you wonder why C uses a less efficient algorithm, it doesn't (necessarily). Despite the qsort name, there's no mandate that it use Quicksort under the covers - that's a common misconception. It may well use one of the other algorithms.
Kindly note that the running time complexity of heap sort is the same as O(n log n) irrespective of whether the array is already partially sorted in either ascending or descending order.
Kindly refer to below link for further clarification on big O calculation for the same :
https://ita.skanev.com/06/04/03.html

Why is quicksort faster in average than others?

As we know, the quicksort performance is O(n*log(n)) in average but the merge- and heapsort performance is O(n*log(n)) in average too. So the question is why quicksort is faster in average.
Wikipedia suggests:
Typically, quicksort is significantly
faster in practice than other O(nlogn)
algorithms, because its inner loop can
be efficiently implemented on most
architectures, and in most real-world
data, it is possible to make design
choices that minimize the probability
of requiring quadratic time.
Additionally, quicksort tends to make
excellent usage of the memory
hierarchy, taking perfect advantage of
virtual memory and available caches.
Although quicksort is not an in-place
sort and uses auxiliary memory, it is
very well suited to modern computer
architectures.
Also have a look at comparison with other sorting algorithms on the same page.
See also Why is quicksort better than other sorting algorithms in practice? on the CS site.
Worst case for quick sort is actually worse than heapsort and mergesort, but quicksort is faster on average.
As to why, it will take time to explain and thus i will refer to Skiena, The algorithm design manual.
A quote that summarizes the quicksort vs merge/heapsort:
When faced with algorithms of the same asymptotic complexity, implementation
details and system quirks such as cache performance and memory size may
well prove to be the decisive factor.
What we can say is that experiments show that where a properly implemented
quicksort is implemented well, it is typically 2-3 times faster than mergesort or
heapsort. The primary reason is that the operations in the innermost loop are
simpler. But I can’t argue with you if you don’t believe me when I say quicksort is
faster. It is a question whose solution lies outside the analytical tools we are using.
The best way to tell is to implement both algorithms and experiment.
Timsort might be a better option as it is optimised for the kind of data seen when sorting in general, in the Python language where data often contains embedded 'runs' of presorted items. It has lately been adopted by Java too.

Regarding in-place merge in an array

I came across the following question.
Given an array of n elements and an integer k where k < n. Elements {a0...ak} and
{ak+1...an} are already sorted. Give an algorithm to sort in O(n) time and O(1) space.
It does not seem to me like it can be done in O(n) time and O(1) space. The problem really seems to be asking how to do the merge step of mergesort but in-place. If it was possible, wouldn't mergesort be implemented that way? I am unable to convince myself though and need some opinion.
This seems to indicate that it is possible to do in O(lg^2 n) space. I cannot see how to prove that it is impossible to merge in constant space, but I cannot see how to do it either.
Edit:
Chasing references, Knuth Vol 3 - Exercise 5.5.3 says "A considerably more complicated algorithm of L. Trabb-Pardo provides the best possible answer to this problem: It is possible to do stable merging in O(n) time and stable sorting in O(n lg n) time, using only O(lg n) bits of auxiliary memory for a fixed number of index variables.
More references that I have not read. Thanks for an interesting problem.
Further edit:
This article claims that the article by Huang and Langston have an algorithm that merges two lists of size m and n in time O(m + n), so the answer to your question would seem to be yes. Unfortunately I do not have access to the article, so I must trust the second hand information. I'm not sure how to reconcile this with Knuth's pronouncement that the Trabb-Pardo algorithm is optimal. If my life depended on it, I'd go with Knuth.
I now see that this had been asked as and earlier Stack Overflow question a number of times. I don't have the heart to flag it as a duplicate.
Huang B.-C. and Langston M. A., Practical in-place merging, Comm. ACM 31 (1988) 348-352
There are several algorithms for doing this, none of which are very easy to intuit. The key idea is to use a part of the arrays to merge as a buffer, then doing a standard merge using this buffer for auxiliary space. If you can then reposition the elements so that the buffer elements are in the right place, you're golden.
I have written up an implementation of one of these algorithms on my personal site if you're interested in looking at it. It's based on the paper "Practical In-Place Merging" by Huang and Langston. You probably will want to look over that paper for some insight.
I've also heard that there are good adaptive algorithms for this, which use some fixed-size buffer of your choosing (which could be O(1) if you wanted), but then scale elegantly with the buffer size. I don't know any of these off the top of my head, but I'm sure a quick search for "adaptive merge" might turn something up.
No it isn't possible, although my job would be much easier if it was :).
You have a O(log n) factor which you can't avoid. You can choose to take it as time or space, but the only way to avoid it is to not sort. With O(log n) space you can build a list of continuations that keep track of where you stashed the elements that didn't quite fit. With recursion this can be made to fit in O(1) heap, but that's only by using O(log n) stack frames instead.
Here is the progress of merge-sorting odds and evens from 1-9. Notice how you require log-space accounting to track the order inversions caused by the twin constraints of constant space and linear swaps.
. -
135792468
. -
135792468
: .-
125793468
: .-
123795468
#.:-
123495768
:.-
123459768
.:-
123456798
.-
123456789
123456789
There are some delicate boundary conditions, slightly harder than binary search to get right, and even in this (possible) form, and therefore a bad homework problem; but a really good mental exercise.
Update
Apparently I am mistaken and there is an algorithm that provides O(n) time and O(1) space. I have downloaded the papers to enlighten myself, and withdraw this answer as incorrect.

Resources