What is the running time of comb sort? - arrays

According to me, Comb sort should also run in sub quadratic time just like shell sort. This is because comb sort is to bubble sort just how shell sort is related to insertion sort. Shell sort sorts the array according to gap sequences applying insertion sort and similarly comb sort sorts the array according to gap sequences applying bubble sort. So what is the the running time of comb sort?

(This question has been unanswered for a while, so I'm converting my comment into an answer.)
Although there are similarities between shell sort and comb sort, the average-case runtime of comb sort is O(n2). Proving this is a bit tricky, and the technique that I've seen used to prove it is the incompressibility method, an information-theoretic technique involving Kolmogorov complexity.
Hope this helps!

With what sequence of increments?
If the increments are chosen to be: the set of all numbers of the form (2^p * 3^q), that are less than N, then, yes, the running time is better than quadratic (it's proportional to N times the square of the logarithm of N). With that set of increments, Combsort performs exactly the same exchanges as a Shellsort using the same increments (the "Pratt sequence"). But that's not what people usually have in mind when they're talking about Combsort.
In theory...
With increments that are decreasing geometrically (e.g. on each pass over the input the increment is, say, about 80% of the previous increment), which is what people usually mean when they talk about Combsort... yes, asymptotically, it is quadratic in both the worst-case and the average case. But...
In practice...
So long as the increments are relatively prime and the ratio between one increment and the next is sensible (80% is fine), n has to astronomically large before the average running time will be much more than n.log(n). I've sorted hundreds of millions of records at a time with Combsort, and I've only ever seen quadratic running times when I've deliberately engineered them by constructing "killer inputs". In practice, with relatively prime increments (and a ratio between adjacent increments of 1.25:1), even for millions of records, Combsort requires on average, about 3 times as many comparisons as a mergesort and typically takes between 2 and 3 times as long to run.

Related

Is O(cn) at least as fast as O(n) in a non asymptotically way?

So first of all let me talk about the motivation for this question. Let's supose you have to find the minimum and the maximum values in an array. In this case, you wave two ways of doing so.
The first one consists in iterating over the array and finding the maximum value, then doing the same thing to find the minimum value. This solution is O(2n).
The second one consists in iterating over the array just one time and finding both the minimum and maximum value at the same time. This solution is O(n).
Even though the time complexity has been halved, for each iteration of the O(n) solution you now have twice as many instructions (ignoring how the compiler can possibly optmize these instructions) so I believe they should take the same amount of time to execute.
Let me give you a second example. Now you need to reverse an array. Again, you have two ways of doing so.
The first one is to create an empty array, iterate over the data array filling the empty array. This solution is O(n).
The second one is to iterate over the data array, swapping the 0th and n-1th elements, then the 1th and n-2th elements and so on (using this strategy) until you reach the middle of the array. This solution is O((1/2)n).
Again, even though the time complexity has been cutted in half, you have three times more instructions per iteration. You're iterating over (1/2)n elements, but for each iteration you have to perform three XOR instructions. If you were not to use XOR, but an auxiliary variable you would still need 2 more instructions to perform the variable swapping, so now I believe that o((1/2)n) should actually be worse than o(n).
Having said these things, my question is the following:
Ignoring space complexity, garbage collecting and the compiler possible optimizations, can I assume that having O(c1*n) and O(c2*n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1*n) is as fast or faster than the one that gives me O(c2*n)?
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.
c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
The whole issue lies within the words "fast" or "faster". Computational complexity doesn't strictly measure what we intuitively understand as "fast". Without going into mathematical details (although it's a good idea: https://en.wikipedia.org/wiki/Big_O_notation), it answers the question "how fast it will go slower when my input grows". So if you have O(n^2) complexity you can roughly expect that doubling the size of the input will make your algorithm take 4 times more time. Whereas for linear complexity, 2 times bigger input gives only doubles the time. As you can see, it's relative, so any constants cancel out.
To sum up: from the way you ask your question, it doesn't seem the big-O notation is the correct tool here.
By definition, if c1 and c2 are constants, O(c1*n) === O(c2*c) === O(n). That is, the number of operations per element of your array of length n is completely irrelevant in this kind of complexity analysis.
All that it will tell you is that "it's linear". That is, if you have 1 bazillion operations for an array of length n, then you'll have 2 bazillion operations for an array of length 2*n (plus or minus something that grows slower than linear).
can I assume that having O(c1n) and O(c2n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
Nope, not at all.
First, because the constants there are meaningless in that analysis. There's no way to put it: it is absolutely irrelevant whatever restrictions you put in c1 and c2 for big-O analysis. The whole idea is that it will discard those restrictions.
Second, because they don't tell you anything that would enable you to compare the two algorithms runtime for a specific value of n.
Such complexity analysis only enables you to compare the asymptotic behavior of algorithms. Real-world problems in general don't care about where the asymptotes are.
Assume that A1(n) is the number of operations Algorithm 1 needs for an input of length n, and A2(n) is the same for Algorithm 2. You could have:
A1(n) = 10n + 900
A2(n) = 100n
The complexity of both is O(A1) = O(A2) = O(n). For small inputs, A2 is faster. For large inputs, A1 is faster. The point where they change is n == 10.
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.
Not only that, but also there's the fact that when you have 2 different algorithms that are really of different complexity classes (e.g., linear vs quadratic), it might still make sense to use the one of higher complexity as it may still be faster.
For example:
A3(n) = n^2
A4(n) = n + 10^20.
E.g., Algorithm 3 is quadratic, while Algorithm 4 is linear but it has a constant huge initialization time.
For inputs of size of up to around n == 10^10, it will be faster to use the quadratic algorithm.
It may very well be the case that all relevant inputs for your specific problem fall within that range, meaning that the quadratic algorithm would be the better, faster choice.
The bottom line is: for analyzing the actual time it will take to run an algorithm on a given input (or a given bounded range of inputs, as nearly all real-world problems are) and compare it with another algorithm, big-O analysis is meaningless.
Another way to put it: you're asking a practical "engineering" question (i.e., which option is better / faster) but trying to answer the question with a tool that's only useful for "theoretical" analysis. That tool is important, yes. But it has no chance of giving you the answer you're looking for, by design.
By definition, time complexity ignores constants. So O((1/2)n) == O(n) == O(2n) == O(cn).
Your example of O((1/2)n) shows why this is the case, because the constants can measure units of anything, so comparing them is meaningless.
You can never tell which algorithm is faster based only on the time complexity. But, you can tell which one would be faster as n approaches infinity. Since constants are removed from the time complexity, they would be considered equal and therefore with O(c1n) and O(c2n) you still would not be able to tell which one is faster even as n approaches infinity.
(my theoretical computer science courses are a couple of decades ago)
O(cn) is O(n).
It's still a linear search over the array.

Why bubble sort is not efficient?

I am developing backend project using node.js and going to implement sorting products functionality.
I researched some articles and there were several articles saying bubble sort is not efficient.
Bubble sort was used in my previous projects and I was surprised why it is bad.
Could anyone explain about why it is inefficient?
If you can explain by c programming or assembler commands it would be much appreciated.
Bubble Sort has O(N^2) time complexity so it's garbage for large arrays compared to O(N log N) sorts.
In JS, if possible use built-in sort functions that the JS runtime might be able to handle with pre-compiled custom code, instead of having to JIT-compile your sort function. The standard library sort should (usually?) be well-tuned for the JS interpreter / JIT to handle efficiently, and use an efficient implementation of an efficient algorithm.
The rest of this answer is assuming a use-case like sorting an array of integers in an ahead-of-time compiled language like C compiled to native asm. Not much changes if you're sorting an array of structs with one member as the key, although cost of compare vs. swap can vary if you're sorting char* strings vs. large structs containing an int. (Bubble Sort is bad for any of these cases with all that swapping.)
See Bubble Sort: An Archaeological Algorithmic Analysis for more about why it's "popular" (or widely taught / discussed) despite being one the worst O(N^2) sorts, including some accidents of history / pedagogy. Also including an interesting quantitative analysis of whether it's actually (as sometimes claimed) one of the easiest to write or understand using a couple code metrics.
For small problems where a simple O(N^2) sort is a reasonable choice (e.g. the N <= 32 element base case of a Quick Sort or Merge Sort), Insertion Sort is often used because it has good best-case performance (one quick pass in the already-sorted case, and efficient in almost-sorted cases).
A Bubble Sort (with an early-out for a pass that didn't do any swaps) is also not horrible in some almost-sorted cases but is worse than Insertion Sort. But an element can only move toward the front of the list one step per pass, so if the smallest element is near the end but otherwise fully sorted, it still takes Bubble Sort O(N^2) work. Wikipedia explains Rabbits and turtles.
Insertion Sort doesn't have this problem: a small element near the end will get inserted (by copying earlier elements to open up a gap) efficiently once it's reached. (And reaching it only requires comparing already-sorted elements to determine that and move on with zero actual insertion work). A large element near the start will end up moving upwards quickly, with only slightly more work: each new element to be examined will have to be inserted before that large element, after all others. So that's two compares and effectively a swap, unlike the one swap per step Bubble Sort would do in it's "good" direction. Still, Insertion Sort's bad direction is vastly better than Bubble Sort's "bad" direction.
Fun fact: state of the art for small-array sorting on real CPUs can include SIMD Network Sorts using packed min/max instructions, and vector shuffles to do multiple "comparators" in parallel.
Why Bubble Sort is bad on real CPUs:
The pattern of swapping is probably more random than Insertion Sort, and less predictable for CPU branch predictors. Thus leading to more branch mispredicts than Insertion Sort.
I haven't tested this myself, but think about how Insertion Sort moves data: each full run of the inner loop moves a group of elements to the right to open up a gap for a new element. The size of that group might stay fairly constant across outer-loop iterations so there's a reasonable chance of predicting the pattern of the loop branch in that inner loop.
But Bubble Sort doesn't do so much creation of partially-sorted groups; the pattern of swapping is unlikely to repeat1.
I searched for support for this guess I just made up, and did find some: Insertion sort better than Bubble sort? quotes Wikipedia:
Bubble sort also interacts poorly with modern CPU hardware. It produces at least twice as many writes as insertion sort, twice as many cache misses, and asymptotically more branch mispredictions.
(IDK if that "number of writes" was naive analysis based on the source, or looking at decently optimized asm):
That brings up another point: Bubble Sort can very easily compile into inefficient code. The notional implementation of swapping actually stores into memory, then re-reads that element it just wrote. Depending on how smart your compiler is, this might actually happen in the asm instead of reusing that value in a register in the next loop iteration. In that case, you'd have store-forwarding latency inside the inner loop, creating a loop-carried dependency chain. And also creating a potential bottleneck on cache read ports / load instruction throughput.
Footnote 1: Unless you're sorting the same tiny array repeatedly; I tried that once on my Skylake CPU with a simplified x86 asm implementation of Bubble Sort I wrote for this code golf question (the code-golf version is intentionally horrible for performance, optimized only for machine-code size; IIRC the version I benchmarked avoided store-forwarding stalls and locked instructions like xchg mem,reg).
I found that with the same input data every time (copied with a few SIMD instructions in a repeat loop), the IT-TAGE branch predictors in Skylake "learned" the whole pattern of branching for a specific ~13-element Bubble Sort, leading to perf stat reporting under 1% branch mispredicts, IIRC. So it didn't demonstrate the tons of mispredicts I was expecting from Bubble Sort after all, until I increased the array size some. :P
Bubble sort runs in O(n^2) time complexity. Merge sort takes O(n*log(n)) time, while quick sort takes O(n*log(n)) time on average, thus performing better than bubble sort.
Refer to this: complexity of bubble sort.

Algorithm Complexity vs Running Time

I have an algorithm used for signal quantization. For the algorithm I have an equation to calculate its complexity with different values of parameters. This algorithm is implemented in C. Sometimes according to the equation I have less complexity but the running time is higher. I'm not 100% sure about the equation.
My question is running time and algorithm complexity are all the time having straight relation? Means, always the higher complexity we have, the higher running time happens? Or it's different from one algorithm to another?
Time complexity is more a measure of how time varies with input size than an absolute measure.
(This is an extreme simplification, but it will do for explaining the phenomenon you're seeing.)
If n is your problem size and your actual running time is 1000000000 * n, it has linear complexity, while 0.000000001*n^2 would be quadratic.
If you plot them against each other, you'll see that 0.000000001*n^2 is smaller than 1000000000 * n all the way up to around n = 1e18, despite its "greater complexity".
(0.000000001*n^2 + 1000000000 * n would also be quadratic, but always have worse execution time than both.)
No, running time and algorithmic complexity do not have a simple relationship.
Estimating or comparing run times can easily get very complicated and detailed. There are many variables that vary even with the same program and input data - that's why benchmarks do multiple runs and process them statistically.
If you're looking for big differences, generally the two most significant factors are algorithmic complexity ("big O()") and start up time. Frequently, the lower "big O()" algorithm requires more complex startup; that is, it takes more initial setup in the program before entering the actual loop. If it takes longer to do that initial setup than run the rest of the algorithm for small data sets, the larger O() rated algorithm will run faster for those small data sets. For large data sets, the lower O() algorithm will be faster. There will be a data set size where the total time is equal, called the "crossover" size.
For performance, you'd want to check if most of your data was above or below that crossover as part of picking the algorithm to implement.
Getting more and more detail and accuracy in runtime predictions gets much more complex very quickly.

Which sort takes O(n) in the given condition

I am preparing for a competition and stumbled upon this question: Considering a set of n elements which is sorted except for one element that appears out of order. Which of the following takes O(n) time?
Quick Sort
Heap Sort
Merge Sort
Bubble Sort
My reasoning is as follows:
I know Merge sort takes O(nlogn) even in best case so its not the answer.
Quick sort too will take O(n^2) since the array is almost sorted.
Bubble sort can be chosen but only if we modify it slightly to check whether a swap has been made in a pass or not.
Heap sort can be chosen as if we create the min heap of a sorted array it takes O(n) time since only one guy is not in place so he takes logn.
Hence I think its Heap sort. Is this reasoning correct? I would like to know if I'm missing something.
Let's start from the bubble sort. From my experience most resources I have used defined bubble sort with a stopping condition of not performing any swaps in an iteration (see e.g. Wikipedia). In this case indeed bubble sort will indeed stop after a linear number of steps. However, I remember that I have stumbled upon descriptions that stated a constant number of iterations, which makes your case quadratic. Therefore, all I can say about this case is "probably yes"—it depends on the definition used by the judges of the competition.
You are right regarding merge sort and quick sort—the classical versions of both algorithms enforce Θ(n log n) behavior on every input.
However, your reasoning regarding heap sort seems incorrect to me. In a typical implementation of heap sort, the heap is being built in the order opposite to the desired final order. Therefore, if you decide to build a min-heap, the outcome of the algorithm will be a reversed order, which—I guess—is not the desired one. If, on the other hand, you decide to build a max-heap, heap sort will obviously spend lots of time sifting elements up and down.
Therefore, in this case I'd go with bubble sort.
This is a bad question because you can guess which answer is supposed to be right, but it takes so many assumptions to make it it actually right that the question is meaningless.
If you code bubblesort as shown on the Wikipedia page, then it will stop in O(n) if the element that's out of order is "below" its proper place with respect to the sort iteration. If it's above, then it moves no more than one position toward its proper location on each pass.
To get the element unconditionally to its correct location in O(n), you'd need a variation of bubblesort that alternately makes passes in each direction.
The conventional implementations of the other sorts are O(n log n) on nearly sorted input, though Quicksort can be O(n^2) if you're not careful. A proper implementation with a Dutch National Flag partition is required to prevent bad behavior.
Heapsort takes only O(n) time to build the heap, but Theta(n log n) time to pull n items off the heap in sorted order, each in Theta(log n) time.

Parallel algorithms O(log p)

First off this isn't for any homework question, it's just on a general type of algorithm. In a parallel computing course I'm taking I'm having trouble wrapping my head around a style of algorithm that has runtime O( something + ... log p). For example we've looked at sequence reduction algorithms that are O(n/p + log p) where p = #procs and n is problem size. Log base 2.
The problem I have is the idea of log(p). For one I'm used to seeing log(n) everywhere in reducing problems to two subproblems of size n/2 etc. The second is just the idea of having the step complexity of an algorithm as log(p). Because that would imply that for a problem of fixed size if I increase the number of processors then I am increasing the number of steps in the algorithm? I have always thought of the step complexity of an algorithm as the sort of inherent sequential aspect of the algorithm and hence increasing or decreasing the number of processors shouldn't have any effect on this. Is this a bad way to think of it?
I guess what would be helpful is some pseudocode of algorithms that have log(p) running time somewhere in them.
Consider computing the sum of n numbers. Each processor can be assigned n/p numbers, but how do you add up the results from the individual processors? You could pass all p results to one processor, for a runtime O(n/p+p), but you can combine the sums faster in a tree-like fashion.
I think that O(n/p + log(p)) does make sense, because n/p + log(p) it's decreasing at the increasing of the p variable, so the running time decrease as you add processors and this bound does make sense; otherwise a running time of log(p) isn't likely to be natural because it's decreasing in respect to the processors number.

Resources