Quicksort

SIAM News named quicksort as one of the Top 10 Algorithms of the 20th Century. It is certainly one of the most widely used. Quicksort was originally developed in 1960 by Tony Hoare.

A wide variety of implementations exist. The variant shown here incorporates a solution of Dijkstra's Dutch national flag problem. I first saw it in a Sedgewick presentation of recent vintage. His earlier Algorithms in C++ and Algorithms in Java contain implementations that were intended to provide the same three way partition of the data. That code is complicated, and contains a bug that makes the three way partition unreliable, although the code successfully sorts the data. Confused? No matter. Stick to the simple partition algorithm used by the demo shown here.

Quicksort's worst case time complexity is achieved on presorted data, and is quadratic in the length n of the data, while average performance on randomly arranged data is of the order n*log(n). That's why we start by shuffling the data. Randomization also makes it more difficult for a hostile party to attack the sorting code.

Traditional implementations of quicksort are plagued by quadratic complexity when the data is highly repetitive. The three way partition avoids avoids the issue. Performance is linear for sufficiently repetitive data.

In practice, quicksort is often more efficient than other sorting algorithms that have worst case complexity of the order n*log(n). It is a good general purpose sorting algorithm, but one size does not fit all. I once heard Knuth say there are n different sorting algorithms, and each is best for something, except bubble sort, which is best for teaching. Knuth being himself, gave a specific value for n.

Insertion sort is faster than quicksort for very small amounts of data, so we use it when the there are 10 or fewer items to sort. Various other refinements are possible, including elimination of recursion by using an explicit stack, and always sorting the smaller half first to reduce stack height and avoid stack overflows.

The best choice of pivot element is the median of the data. Jon Bentley and Doug McIlroy used a median of 3 elements with more than 7, and less than 41 items to sort. For more than 40 items, they “adopted Tukey’s ‘ninther’, the median of the medians of three samples, each of three elements.” I know from experience that it's worthwhile to use a better approximation of the median, based on a much larger sample, when sorting very large amounts of data.

Valid CSS 2.1!