Definition:
A priority queue is an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue.
Implementation:
To implement Priority queue, unsorted array, sorted array and binary heap data structure are the 3 implementation strategies .
To be specific, binary heap implementation strategy can be represented using array of keys,
or
each key as binary node having two children.
Question:
Apart from priority queue implementation, Are their any other applications of using binary heap data structure?
A binary heap can be used to extract (max or min) element in O(logn) time. This property can be exploited to be used in many algorithms to get better run-time.
For example, once I used it in k-merge sort algorithm to increase time efficiency of sorting step of the k-merge sort. In brief, it made binary heaps of the k-subarrays, and the sorting can be achieved in linear time which is better than usual sorting step of a merge sort.
It is also used in Dijkstra's algorithm, Prim's algorithm to decrease their run time.
You can also take a look here
Binary heaps have one other useful (and major) application: HeapSort. HeapSort has higher overhead than QuickSort but its worst case is O(n log n) vs. QuickSort's O(n*n). QuickSort can be improved upon to obtain a worst case of O(n log n) by switching to HeapSort once the interval is sufficiently short -- this is called IntroSort, and is what is used in the STL and the C++ standard library. See https://en.wikipedia.org/wiki/Introsort
Related
Is there a particular reason the B+-tree is preferred, when implementing a larger scale database-system, over the Fibonacci heap? From the complexity analysis in the image it would seem that Fibonacci heap is faster.
A B+ tree is a search tree and not a heap. The image is not comparing a Fibonacci heap with a B+ tree, but with a binary heap.
Comparing heap and search tree
A heap is a data structure that provides a lazy order, i.e. to get the ith value in sorted order, you will have to alter the heap as you pop values from it. This is true for both heap implementations in the image you shared.
A search tree has a stronger focus on order. You can iterate its values in order in O(n) time without making any change to the tree. For a heap that would amount to O(nlogn), as you would need n extract-min operations, and the heap loses the values you extract from it.
You wrote:
Is there a particular reason the B+-tree is preferred, when implementing a larger scale database-system
A heap is not a useful data structure for indexing data in database systems, as the order is not known without alteration, and the nodes, when read in ordered sequence, are scattered at different disk locations.
A search tree is a better fit for this purpose. Among search trees, those that go well with larger block sizes are interesting choices for databases that have their data on relatively slower disks. A B-tree is such an example. B+-trees have as advantage over B-trees that they store values in order within linked leaf-blocks, so that they optimise on ordered iteration, while B-trees take a little bit less space than B+-trees.
Comparing binary heap and Fibonacci heap
The difference in time complexity between a binary heap and Fibonacci heap could be a factor to go with the Fibonacci heap. But as a Fibonacci heap has larger overhead, the gain would only appear for larger data sets. On Wikipedia it says:
Although Fibonacci heaps look very efficient, they have the following
two drawbacks (as mentioned in the paper "The Pairing Heap: A new form
of Self Adjusting Heap"):
"They are complicated when it comes to
coding them. Also they are not as efficient in practice when compared
with the theoretically less efficient forms of heaps, since in their
simplest version they require storage and manipulation of four
pointers per node, compared to the two or three pointers per node
needed for other structures ".
These other structures are referred
to Binary heap, Binomial heap, Pairing Heap, Brodal Heap and Rank
Pairing Heap.
Although the total running time of a sequence of operations starting
with an empty structure is bounded by the bounds given above, some
(very few) operations in the sequence can take very long to complete
(in particular delete and delete minimum have linear running time in
the worst case). For this reason Fibonacci heaps and other amortized
data structures may not be appropriate for real-time systems.
I am developing backend project using node.js and going to implement sorting products functionality.
I researched some articles and there were several articles saying bubble sort is not efficient.
Bubble sort was used in my previous projects and I was surprised why it is bad.
Could anyone explain about why it is inefficient?
If you can explain by c programming or assembler commands it would be much appreciated.
Bubble Sort has O(N^2) time complexity so it's garbage for large arrays compared to O(N log N) sorts.
In JS, if possible use built-in sort functions that the JS runtime might be able to handle with pre-compiled custom code, instead of having to JIT-compile your sort function. The standard library sort should (usually?) be well-tuned for the JS interpreter / JIT to handle efficiently, and use an efficient implementation of an efficient algorithm.
The rest of this answer is assuming a use-case like sorting an array of integers in an ahead-of-time compiled language like C compiled to native asm. Not much changes if you're sorting an array of structs with one member as the key, although cost of compare vs. swap can vary if you're sorting char* strings vs. large structs containing an int. (Bubble Sort is bad for any of these cases with all that swapping.)
See Bubble Sort: An Archaeological Algorithmic Analysis for more about why it's "popular" (or widely taught / discussed) despite being one the worst O(N^2) sorts, including some accidents of history / pedagogy. Also including an interesting quantitative analysis of whether it's actually (as sometimes claimed) one of the easiest to write or understand using a couple code metrics.
For small problems where a simple O(N^2) sort is a reasonable choice (e.g. the N <= 32 element base case of a Quick Sort or Merge Sort), Insertion Sort is often used because it has good best-case performance (one quick pass in the already-sorted case, and efficient in almost-sorted cases).
A Bubble Sort (with an early-out for a pass that didn't do any swaps) is also not horrible in some almost-sorted cases but is worse than Insertion Sort. But an element can only move toward the front of the list one step per pass, so if the smallest element is near the end but otherwise fully sorted, it still takes Bubble Sort O(N^2) work. Wikipedia explains Rabbits and turtles.
Insertion Sort doesn't have this problem: a small element near the end will get inserted (by copying earlier elements to open up a gap) efficiently once it's reached. (And reaching it only requires comparing already-sorted elements to determine that and move on with zero actual insertion work). A large element near the start will end up moving upwards quickly, with only slightly more work: each new element to be examined will have to be inserted before that large element, after all others. So that's two compares and effectively a swap, unlike the one swap per step Bubble Sort would do in it's "good" direction. Still, Insertion Sort's bad direction is vastly better than Bubble Sort's "bad" direction.
Fun fact: state of the art for small-array sorting on real CPUs can include SIMD Network Sorts using packed min/max instructions, and vector shuffles to do multiple "comparators" in parallel.
Why Bubble Sort is bad on real CPUs:
The pattern of swapping is probably more random than Insertion Sort, and less predictable for CPU branch predictors. Thus leading to more branch mispredicts than Insertion Sort.
I haven't tested this myself, but think about how Insertion Sort moves data: each full run of the inner loop moves a group of elements to the right to open up a gap for a new element. The size of that group might stay fairly constant across outer-loop iterations so there's a reasonable chance of predicting the pattern of the loop branch in that inner loop.
But Bubble Sort doesn't do so much creation of partially-sorted groups; the pattern of swapping is unlikely to repeat1.
I searched for support for this guess I just made up, and did find some: Insertion sort better than Bubble sort? quotes Wikipedia:
Bubble sort also interacts poorly with modern CPU hardware. It produces at least twice as many writes as insertion sort, twice as many cache misses, and asymptotically more branch mispredictions.
(IDK if that "number of writes" was naive analysis based on the source, or looking at decently optimized asm):
That brings up another point: Bubble Sort can very easily compile into inefficient code. The notional implementation of swapping actually stores into memory, then re-reads that element it just wrote. Depending on how smart your compiler is, this might actually happen in the asm instead of reusing that value in a register in the next loop iteration. In that case, you'd have store-forwarding latency inside the inner loop, creating a loop-carried dependency chain. And also creating a potential bottleneck on cache read ports / load instruction throughput.
Footnote 1: Unless you're sorting the same tiny array repeatedly; I tried that once on my Skylake CPU with a simplified x86 asm implementation of Bubble Sort I wrote for this code golf question (the code-golf version is intentionally horrible for performance, optimized only for machine-code size; IIRC the version I benchmarked avoided store-forwarding stalls and locked instructions like xchg mem,reg).
I found that with the same input data every time (copied with a few SIMD instructions in a repeat loop), the IT-TAGE branch predictors in Skylake "learned" the whole pattern of branching for a specific ~13-element Bubble Sort, leading to perf stat reporting under 1% branch mispredicts, IIRC. So it didn't demonstrate the tons of mispredicts I was expecting from Bubble Sort after all, until I increased the array size some. :P
Bubble sort runs in O(n^2) time complexity. Merge sort takes O(n*log(n)) time, while quick sort takes O(n*log(n)) time on average, thus performing better than bubble sort.
Refer to this: complexity of bubble sort.
Can I implement quick sort using queue?
I found this article only https://www.quora.com/Can-we-use-a-queue-in-quicksort-in-C.
Is this article correct?
If yes, why does the textbook always implement quick sort by stack or recursive method only?
Because the information about this question is rare, so I ask here.
bad cache performance
With stack, we have enough temporal locality, while with queue it is lost completely. We basically are trying to sort the array in breadth first search way in queue method.
EDIT(from Will Ness' answer): And larger arrays(>RAM), queue method won't even work, since it requires O(n) space for sorting an array of size n. While stack based method required only log n space. All theoretical time complexity of both of them is same.
why does the textbook always implement quick sort by stack or recursive method only
Because the essence of quicksort is that it is in-place, and sorting is achieved by repeated partitions of the same array that is being sorted.
The partitions are done from the top down, from bigger-sized portions of the array down to smaller and smaller ones.
If we use stack to manage the-work-yet-to-be-done, the size of that stack will be logarithmic in the array's size (if the partition being put on the stack is the bigger one, always). This is equivalent to depth-first traversal.
But if we'd use queue for that, it would be equivalent to breadth-first traversal, and the size of the queue would be linear (which is exponentially worse than logarithmic).
I am preparing for a competition and stumbled upon this question: Considering a set of n elements which is sorted except for one element that appears out of order. Which of the following takes O(n) time?
Quick Sort
Heap Sort
Merge Sort
Bubble Sort
My reasoning is as follows:
I know Merge sort takes O(nlogn) even in best case so its not the answer.
Quick sort too will take O(n^2) since the array is almost sorted.
Bubble sort can be chosen but only if we modify it slightly to check whether a swap has been made in a pass or not.
Heap sort can be chosen as if we create the min heap of a sorted array it takes O(n) time since only one guy is not in place so he takes logn.
Hence I think its Heap sort. Is this reasoning correct? I would like to know if I'm missing something.
Let's start from the bubble sort. From my experience most resources I have used defined bubble sort with a stopping condition of not performing any swaps in an iteration (see e.g. Wikipedia). In this case indeed bubble sort will indeed stop after a linear number of steps. However, I remember that I have stumbled upon descriptions that stated a constant number of iterations, which makes your case quadratic. Therefore, all I can say about this case is "probably yes"—it depends on the definition used by the judges of the competition.
You are right regarding merge sort and quick sort—the classical versions of both algorithms enforce Θ(n log n) behavior on every input.
However, your reasoning regarding heap sort seems incorrect to me. In a typical implementation of heap sort, the heap is being built in the order opposite to the desired final order. Therefore, if you decide to build a min-heap, the outcome of the algorithm will be a reversed order, which—I guess—is not the desired one. If, on the other hand, you decide to build a max-heap, heap sort will obviously spend lots of time sifting elements up and down.
Therefore, in this case I'd go with bubble sort.
This is a bad question because you can guess which answer is supposed to be right, but it takes so many assumptions to make it it actually right that the question is meaningless.
If you code bubblesort as shown on the Wikipedia page, then it will stop in O(n) if the element that's out of order is "below" its proper place with respect to the sort iteration. If it's above, then it moves no more than one position toward its proper location on each pass.
To get the element unconditionally to its correct location in O(n), you'd need a variation of bubblesort that alternately makes passes in each direction.
The conventional implementations of the other sorts are O(n log n) on nearly sorted input, though Quicksort can be O(n^2) if you're not careful. A proper implementation with a Dutch National Flag partition is required to prevent bad behavior.
Heapsort takes only O(n) time to build the heap, but Theta(n log n) time to pull n items off the heap in sorted order, each in Theta(log n) time.
In which cases heap sort can be used? As we know, heap sort has a complexity of n×lg(n). But it's used far less often than quick and merge sort. So when do we use this heap sort exactly and what are its drawbacks?
Characteristics of Heapsort
O(nlogn) time best, average, worst case performance
O(1) extra memory
Where to use it?
Guaranteed O(nlogn) performance. When you don't necessarily need very fast performance, but guaranteed O(nlogn) performance (e.g. in a game), because Quicksort's O(n^2) can be painfully slow. Why not use Mergesort then? Because it takes O(n) extra memory.
To avoid Quicksort's worst case. C++'s std::sort routine generally uses a varation of Quicksort called Introsort, which uses Heapsort to sort the current partition if the Quicksort recursion goes too deep, indicating that a worst case has occurred.
Partially sorted array even if stopped abruptly. We get a partially sorted array if Heapsort is somehow stopped abruptly. Might be useful, who knows?
Disadvantages
Relatively slow as compared to Quicksort
Cache inefficient
Not stable
Not really adaptive (Doesn't get faster if given somewhat sorted array)
Based on the wikipedia article for sorting algorithms, it appears that the Heapsort and Mergesort all have identical time complexity O(n log n) for best, average and worst case.
Quicksort has a disadvantage there as its worst case time complexity of O(n2) (a).
Mergesort has the disadvantage that its memory complexity is O(n) whereas Heapsort is O(1). On the other hand, Mergesort is a stable sort and Heapsort is not.
So, based on that, I would choose Heapsort in preference to Mergesort if I didn't care about the stability of the sort, so as to minimise memory usage. If stability was required, I would choose MergeSort.
Or, more correctly, if I had huge amounts of data to sort, and I had to code my own algorithms to do it, I'd do that. For the vast majority of cases, the difference between the two is irrelevant, until your data sets get massive.
In fact, I've even used bubble sort in real production environments where no other sort was provided, because:
it's incredibly easy to write (even the optimised version);
it's more than efficient enough if the data has certain properties (either small datsets or datasets that were already mostly sorted before you added a couple of items).
Like goto and multiple return points, even seemingly bad algorithms have their place :-)
(a) And, before you wonder why C uses a less efficient algorithm, it doesn't (necessarily). Despite the qsort name, there's no mandate that it use Quicksort under the covers - that's a common misconception. It may well use one of the other algorithms.
Kindly note that the running time complexity of heap sort is the same as O(n log n) irrespective of whether the array is already partially sorted in either ascending or descending order.
Kindly refer to below link for further clarification on big O calculation for the same :
https://ita.skanev.com/06/04/03.html