How do different languages implement sorting in their standard libraries? [closed] - c

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
From what I have (briefly) read, Java and Python both look like they make use of timsort in their standard libaries, while the sorting method in C's stdlib is called qsort because it once was quicksort.
What algorithm do typical languages have implemented in their standard libraries today, and why did they choose that algorithm? Also, did C deviate from quicksort?
I know this question lacks an "actual problem(s) that [I] face" and may seem open ended to some, but knowing how/why certain algorithms are chosen as standard seems pretty useful but relatively untaught. I also feel as though an in depth answer addressing concerns that are language specific (data types?) and machine specific (cache hits?) would provide more insight to how different languages and algorithms work than uni cares to explain.

In musl, we use Smooth Sort. Conceptually it's a variant of heap sort (and likewise in-place and O(n log n) time), but it has the nice property that the worst-case performance approaches O(n) for already-sorted or near-sorted input. I'm not convinced it's the best possible choice, but it appears very difficult to do better with an in-place algorithm with O(n log n) worst-case.
Being a little-known invention of Dijkstra's also makes it kind of cool. :-)

C does not specificy the algorithm to be used by qsort.
On current glibc (2.17) qsort allocates memory (using malloc or alloca if memory required is really small) and uses merge sort algorithm. If memory requirements are too high or if malloc fails it uses quicksort algorithm.

My machine's C library provides qsort, heapsort, and mergesort, saying in the man page:
The qsort() and qsort_r() functions are an implementation of C.A.R. Hoare's "quicksort" algorithm, a variant of partition-exchange sorting; in particular, see D.E.
Knuth's Algorithm Q. Quicksort takes O(n lg n) average time. This implementation uses median selection to avoid its O(n2) worst-case behavior.
The heapsort() function is an implementation of J.W.J. William's "heapsort" algorithm, a variant of selection sorting; in particular, see D.E. Knuth's Algorithm H.
Heapsort takes O(n lg n) worst-case time. Its only advantage over qsort() is that it uses almost no additional memory; while qsort() does not allocate memory, it is
implemented using recursion.
The function mergesort() requires additional memory of size nel * width bytes; it should be used only when space is not at a premium. The mergesort() function is optimized for data with pre-existing order; its worst case time is O(n lg n); its best case is O(n).
Normally, qsort() is faster than mergesort() which is faster than heapsort(). Memory availability and pre-existing order in the data can make this untrue.
There are plenty of open source C libraries for you to look at if you want to see specific details of the implementation.
As far as 'why did system X choose algorithm Y', that's a pretty tough question to answer meaningfully - if you're not lucky enough to find a rationale in the documentation, you'd have to ask the designers directly.

I did a quick scan in the C11 standard about qsort() and I couldn't find any reference about how qsort() should be implemented
and the expected time/space complexity of the algorithm. All it has to say was about certain conditions about the comparator
function.
What that means is an implementation can choose any comparator based algorithm that fits with qsort(). For example, an implementation can choose to use a naive algorithm such as bubble sort to implement qsort() which is not as efficient as the real quick sort. Bottom line is that it's upto the implementation to decide on the actual algorithm.

Related

Running Time of built in functions

if I use built in functions from java do I have to take into consideration there running time or should I count them as constant time. what will be the time complexity of the following function
def int findMax(int [] a)
{
a.Arrays.sort();
n=a.length;
return a[n-1];
}
Nearly all the work here is being done by the sort() algorithm. For sorting arrays, Java uses Quicksort, which has an O(n log n) average and O(n^2) worst case performance.
From the Java documentation on sort():
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
Given your question lacks any use case whatsoever, and your comment on my answer, I feel like I have to point out that this is a classic example of premature optimization. You're looking for the mathematical complexity of a trivial method without any indication that this method will account for any significant portion of the CPU-time used by your program. This is especially true given that your implementation is incredibly inefficient: iterating through the array and storing the highest value would execute in O(n).

Are there problems that can't be solved efficiently without arrays? [duplicate]

Does anyone know what is the worst possible asymptotic slowdown that can happen when programming purely functionally as opposed to imperatively (i.e. allowing side-effects)?
Clarification from comment by itowlson: is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?
According to Pippenger [1996], when comparing a Lisp system that is purely functional (and has strict evaluation semantics, not lazy) to one that can mutate data, an algorithm written for the impure Lisp that runs in O(n) can be translated to an algorithm in the pure Lisp that runs in O(n log n) time (based on work by Ben-Amram and Galil [1992] about simulating random access memory using only pointers). Pippenger also establishes that there are algorithms for which that is the best you can do; there are problems which are O(n) in the impure system which are Ω(n log n) in the pure system.
There are a few caveats to be made about this paper. The most significant is that it does not address lazy functional languages, such as Haskell. Bird, Jones and De Moor [1997] demonstrate that the problem constructed by Pippenger can be solved in a lazy functional language in O(n) time, but they do not establish (and as far as I know, no one has) whether or not a lazy functional language can solve all problems in the same asymptotic running time as a language with mutation.
The problem constructed by Pippenger requires Ω(n log n) is specifically constructed to achieve this result, and is not necessarily representative of practical, real-world problems. There are a few restrictions on the problem that are a bit unexpected, but necessary for the proof to work; in particular, the problem requires that results are computed on-line, without being able to access future input, and that the input consists of a sequence of atoms from an unbounded set of possible atoms, rather than a fixed size set. And the paper only establishes (lower bound) results for an impure algorithm of linear running time; for problems that require a greater running time, it is possible that the extra O(log n) factor seen in the linear problem may be able to be "absorbed" in the process of extra operations necessary for algorithms with greater running times. These clarifications and open questions are explored briefly by Ben-Amram [1996].
In practice, many algorithms can be implemented in a pure functional language at the same efficiency as in a language with mutable data structures. For a good reference on techniques to use for implementing purely functional data structures efficiently, see Chris Okasaki's "Purely Functional Data Structures" [Okasaki 1998] (which is an expanded version of his thesis [Okasaki 1996]).
Anyone who needs to implement algorithms on purely-functional data structures should read Okasaki. You can always get at worst an O(log n) slowdown per operation by simulating mutable memory with a balanced binary tree, but in many cases you can do considerably better than that, and Okasaki describes many useful techniques, from amortized techniques to real-time ones that do the amortized work incrementally. Purely functional data structures can be a bit difficult to work with and analyze, but they provide many benefits like referential transparency that are helpful in compiler optimization, in parallel and distributed computing, and in implementation of features like versioning, undo, and rollback.
Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.
References
Ben-Amram, Amir and Galil, Zvi 1992. "On Pointers versus Addresses" Journal of the ACM, 39(3), pp. 617-648, July 1992
Ben-Amram, Amir 1996. "Notes on Pippenger's Comparison of Pure and Impure Lisp" Unpublished manuscript, DIKU, University of Copenhagen, Denmark
Bird, Richard, Jones, Geraint, and De Moor, Oege 1997. "More haste, less speed: lazy versus eager evaluation" Journal of Functional Programming 7, 5 pp. 541–547, September 1997
Okasaki, Chris 1996. "Purely Functional Data Structures" PhD Thesis, Carnegie Mellon University
Okasaki, Chris 1998. "Purely Functional Data Structures" Cambridge University Press, Cambridge, UK
Pippenger, Nicholas 1996. "Pure Versus Impure Lisp" ACM Symposium on Principles of Programming Languages, pages 104–109, January 1996
There are indeed several algorithms and data structures for which no asymptotically efficient purely functional solution (t.i. one implementable in pure lambda calculus) is known, even with laziness.
The aforementioned union-find
Hash tables
Arrays
Some graph algorithms
...
However, we assume that in "imperative" languages access to memory is O(1) whereas in theory that can't be so asymptotically (i.e. for unbounded problem sizes) and access to memory within a huge dataset is always O(log n), which can be emulated in a functional language.
Also, we must remember that actually all modern functional languages provide mutable data, and Haskell even provides it without sacrificing purity (the ST monad).
This article claims that the known purely functional implementations of the union-find algorithm all have worse asymptotic complexity than the one they publish, which has a purely functional interface but uses mutable data internally.
The fact that other answers claim that there can never be any difference and that for instance, the only "drawback" of purely functional code is that it can be parallelized gives you an idea of the informedness/objectivity of the functional programming community on these matters.
EDIT:
Comments below point out that a biased discussion of the pros and cons of pure functional programming may not come from the “functional programming community”. Good point. Perhaps the advocates I see are just, to cite a comment, “illiterate”.
For instance, I think that this blog post is written by someone who could be said to be representative of the functional programming community, and since it's a list of “points for lazy evaluation”, it would be a good place to mention any drawback that lazy and purely functional programming might have. A good place would have been in place of the following (technically true, but biased to the point of not being funny) dismissal:
If strict a function has O(f(n)) complexity in a strict language then it has complexity O(f(n)) in a lazy language as well. Why worry? :)
With a fixed upper bound on memory usage, there should be no difference.
Proof sketch:
Given a fixed upper bound on memory usage, one should be able to write a virtual machine which executes an imperative instruction set with the same asymptotic complexity as if you were actually executing on that machine. This is so because you can manage the mutable memory as a persistent data structure, giving O(log(n)) read and writes, but with a fixed upper bound on memory usage, you can have a fixed amount of memory, causing these to decay to O(1). Thus the functional implementation can be the imperative version running in the functional implementation of the VM, and so they should both have the same asymptotic complexity.

Why is quicksort faster in average than others?

As we know, the quicksort performance is O(n*log(n)) in average but the merge- and heapsort performance is O(n*log(n)) in average too. So the question is why quicksort is faster in average.
Wikipedia suggests:
Typically, quicksort is significantly
faster in practice than other O(nlogn)
algorithms, because its inner loop can
be efficiently implemented on most
architectures, and in most real-world
data, it is possible to make design
choices that minimize the probability
of requiring quadratic time.
Additionally, quicksort tends to make
excellent usage of the memory
hierarchy, taking perfect advantage of
virtual memory and available caches.
Although quicksort is not an in-place
sort and uses auxiliary memory, it is
very well suited to modern computer
architectures.
Also have a look at comparison with other sorting algorithms on the same page.
See also Why is quicksort better than other sorting algorithms in practice? on the CS site.
Worst case for quick sort is actually worse than heapsort and mergesort, but quicksort is faster on average.
As to why, it will take time to explain and thus i will refer to Skiena, The algorithm design manual.
A quote that summarizes the quicksort vs merge/heapsort:
When faced with algorithms of the same asymptotic complexity, implementation
details and system quirks such as cache performance and memory size may
well prove to be the decisive factor.
What we can say is that experiments show that where a properly implemented
quicksort is implemented well, it is typically 2-3 times faster than mergesort or
heapsort. The primary reason is that the operations in the innermost loop are
simpler. But I can’t argue with you if you don’t believe me when I say quicksort is
faster. It is a question whose solution lies outside the analytical tools we are using.
The best way to tell is to implement both algorithms and experiment.
Timsort might be a better option as it is optimised for the kind of data seen when sorting in general, in the Python language where data often contains embedded 'runs' of presorted items. It has lately been adopted by Java too.

Regarding in-place merge in an array

I came across the following question.
Given an array of n elements and an integer k where k < n. Elements {a0...ak} and
{ak+1...an} are already sorted. Give an algorithm to sort in O(n) time and O(1) space.
It does not seem to me like it can be done in O(n) time and O(1) space. The problem really seems to be asking how to do the merge step of mergesort but in-place. If it was possible, wouldn't mergesort be implemented that way? I am unable to convince myself though and need some opinion.
This seems to indicate that it is possible to do in O(lg^2 n) space. I cannot see how to prove that it is impossible to merge in constant space, but I cannot see how to do it either.
Edit:
Chasing references, Knuth Vol 3 - Exercise 5.5.3 says "A considerably more complicated algorithm of L. Trabb-Pardo provides the best possible answer to this problem: It is possible to do stable merging in O(n) time and stable sorting in O(n lg n) time, using only O(lg n) bits of auxiliary memory for a fixed number of index variables.
More references that I have not read. Thanks for an interesting problem.
Further edit:
This article claims that the article by Huang and Langston have an algorithm that merges two lists of size m and n in time O(m + n), so the answer to your question would seem to be yes. Unfortunately I do not have access to the article, so I must trust the second hand information. I'm not sure how to reconcile this with Knuth's pronouncement that the Trabb-Pardo algorithm is optimal. If my life depended on it, I'd go with Knuth.
I now see that this had been asked as and earlier Stack Overflow question a number of times. I don't have the heart to flag it as a duplicate.
Huang B.-C. and Langston M. A., Practical in-place merging, Comm. ACM 31 (1988) 348-352
There are several algorithms for doing this, none of which are very easy to intuit. The key idea is to use a part of the arrays to merge as a buffer, then doing a standard merge using this buffer for auxiliary space. If you can then reposition the elements so that the buffer elements are in the right place, you're golden.
I have written up an implementation of one of these algorithms on my personal site if you're interested in looking at it. It's based on the paper "Practical In-Place Merging" by Huang and Langston. You probably will want to look over that paper for some insight.
I've also heard that there are good adaptive algorithms for this, which use some fixed-size buffer of your choosing (which could be O(1) if you wanted), but then scale elegantly with the buffer size. I don't know any of these off the top of my head, but I'm sure a quick search for "adaptive merge" might turn something up.
No it isn't possible, although my job would be much easier if it was :).
You have a O(log n) factor which you can't avoid. You can choose to take it as time or space, but the only way to avoid it is to not sort. With O(log n) space you can build a list of continuations that keep track of where you stashed the elements that didn't quite fit. With recursion this can be made to fit in O(1) heap, but that's only by using O(log n) stack frames instead.
Here is the progress of merge-sorting odds and evens from 1-9. Notice how you require log-space accounting to track the order inversions caused by the twin constraints of constant space and linear swaps.
. -
135792468
. -
135792468
: .-
125793468
: .-
123795468
#.:-
123495768
:.-
123459768
.:-
123456798
.-
123456789
123456789
There are some delicate boundary conditions, slightly harder than binary search to get right, and even in this (possible) form, and therefore a bad homework problem; but a really good mental exercise.
Update
Apparently I am mistaken and there is an algorithm that provides O(n) time and O(1) space. I have downloaded the papers to enlighten myself, and withdraw this answer as incorrect.

Algorithms in C [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What is the best place or a link to learn algorithms in C? How do you know when and where to use the implementation of algorithms by just looking into the problems?
Algorithms aren't necessarily tied to a specific language, just to clarify, so any algorithms book will work great as long as you can understand the concept being the data structure/algorithm.
That said, this seems like a good choice: Algorithms in C. I have the C++ equivalent on my shelf.
There is also a book that seems language agnostic (correct me if I'm wrong) called Data Structures & Algorithm's, though I hear it's a bit dated, so you'll miss out on more recent structures.
Don't forget the internet has a plethora of information available to you. However, books are usually better for these sorts of things. This is because internet resources tend to focus on one thing at a time. For example, you need to understand what Big-O notation is before you can understand what it means when we say a List has O(1) [constant time] removal.
A book will cover these things in the correct order, but an internet resource will focus on either Big-O notation or data structures, but often won't easily connect the two.
When it comes to using it, you'll mostly make the connection when it comes to what you'll be doing with the data.
For example, you might want a vector (array) if you just need ordered elements, but if you need ordered elements and removal from any place (but can sacrifice random access), then a list would be more appropriate, due to it's constant-time removal.
For a reasonable (though far from perfect) book on implementing commonly used algorithms in C, try Sedgewick's Algorithms in C. Note that as for any technical subject,a paper book is likely to be far superior to any Web resources.
As to how to know when to use a specific algorithm, I'm afraid that is down to experience.
For an algortihms text, Cormen, Leiserson and Rivest's 'Introduction to Algorithms' is a good start. The pseudocode implementations are easy to translate to C. Two web resources with many links to documentation about algorithms and sample implementations are:
Stony Brook Algorithm Repository
NIST Directory of Data Structures and Algorithms
Algorithms in C by Sedgewick is a great place to start the investigation. Once you are familiar with what algorithms are available and what the performance characteristics of each are, you'll be able to see where to use each of them.
This is my collection of mostly math-related algorithms:
List of algorithms
FXT (math related)
Numerical Methods
Numerical Recipes in C
How do u know when and where to use
the implementation of algorithms by
just looking into the problems
It's called "pattern matching", once you've seen and solved lots of problems you start to recognize common things and you can reuse your previous knowledge.
By the way, I would recommend you before a good book just on algorithms before starting with algorithms in C, which are more difficult to implement and more error prone than in higher level language, and once you are very confident with the general procedures you can start to tweak and optimize them in C.
Many good resources have already been named, so I won't repeat them here.
As for how do you know what algorithm to use when?
You need to have a big enough tool box, which you will obtain by sitting down and slogging through a long list of basic (and them more esoteric) data structures and algorithms. You should try to get all the basics, but really only need a sample from the more specialized ones.
You need to understand what trade offs are available to you (time, code complexity, memory, single versus multiple passes, in-place versus copy, stable versus unstable sorts, etc. ad nauseum), and how the algorithms you study do on each of these. Again, this is just a case of much studying. Big-O is a place to start, but is not the end all and be all of this.
You need to get a feel for understanding what are the real limits you face when presented with a problem, and how to express these in terms of the algorithm trade offs mentioned above. This requires a degree of intuition, and is generally learned by practice over time.
It is worth implementing some things more then one way as you go along, to learn in your gut, what works and what doesn't.
It is worth reading code written by folks more experienced than yourself, to see how they think.
Good luck.
The Wikipedia List of Algorithms is also very handy reference.
And, if you want to get deeper -- The Art of Computer Programming (wikipedia ref).
Preferably after the Robert Sedgewick book already referred in multiple answers.
I read Pointers on C by Kenneth Reek recently. I thought I was pretty well versed in C, but this book gave me a few epiphanies, despite being aimed at beginners. The code examples are things of beauty (but not the fastest code on a x86-like CPU). It provide good implementations of many of the most common algorithms and data-structures that are in use, with excellent explanations about why they are implemented as they are (and sometimes code or suggestions for alternative implementations).
On the same page as your question: patterns for creating reusable code in C (that is what we all want, isn't it?), C Interfaces and Implementations: Techniques for Creating Reusable Software, by David R. Hanson. It has been a few years since I read it, and I don't have a copy to verify what I recall is correct, but if I remember correctly it deals with how to create good C API:s to data structures and algorithms, as well as giving example implementations of some of the most common algorithms.
Of topic: As I have mostly written throw-away programs in C for private use, this one helped me get rid of some bad coding habits as well as being an excellent C reference: C: A reference Manual. Reminds me that I ought to buy that one.
One needs experience to know which set of algorithms to use for a particular problem. Defining a goal will help. Speed, memory, robustness, solution quality ... are all factors in determining which algorithms to use. We could devise different solutions to the same problem given different set of factors and scenarios.
The Algorithm Design Manual is worth a look.
A easy method to learn algorithms is to use Wiki page, who is dedicated to some "classical" algorithms like search algorithms or for sort. The constructions of algorithms is based on ability to use different data structures, like linked lists or C. So, first try to implement different data structures like simple linked list or binary tree, and after try to use in different algorithms who is related to real life problems.

Resources