trefny.net

Radix Sort for Floats and Signed Integers

After reading [1] I've decided to try sorting render states using radix sort. Because I've found the original source code rather uneasy to read, I've implemented the described method myself. Later it has become part of CGL. If you want to know what radix sort is about or what's the trick to make it work with floats and signed integers, you'd better read [1]. Here you can find the description as well, but it's a little bit more brief. If you're curious, as me, and wonder why comparison-based sorting methods can't make it faster than O(n lg n), you can find the answer below.

Radix sort is an algorithm that can beat the well known O(n lg n) time complexity limit of sorting methods which are based on key comparisons. It does so by decomposition of sort keys into radices (e.g. a digit in a decimal represenation of a number) and directly computing positions of each radix with proper handling of possible radix collisions. It's a multi-pass method with each pass taking care of exactly one radix. Radix sort is a stable sorting algorithm, which means that the same key values on input will preserve their mutual positions when sorted (posi < posj whenever key[posi] == key[posj] and i < j). This property is actually needed for radix sort; otherwise its multi-pass idea wouldn't have worked at all. The time complexity of radix sort is O(kn), where k is the number of passes over input data of size n (k is usually slightly bigger than the number of radices). The only problem, and partly the reason why radix is not used everywhere, is that not every key easily decomposes into radices. Take, for example, floats where decimal representation is useless as every number might have a different number of digits. Also, the sign is a bit of a problem - when the last pass takes the sign bit into account the resulting order is correct only for positive or negative values, for both the total ordering is wrong.

The solution is to use an 8-bit-wide radix and properly handle the last pass, where all the nasty things happen. Now, I use this method for sorting an array of render states in my new, minimalist engine nyxon. The float sorting path requires IEEE floats, otherwise it may fail. Also, the last pass differs for negative floats and integers because the bigger the last byte (7 bits of the exponent with a sign bit) of a negative float, the smaller the value (an exact opposite of negative integers represented as 2's complements.)

The source code is available for viewing or, as a fundamental algorithm, it's part of CGL. Class rsort is the actual sorting engine. Bear in mind that it requires some extra memory (that's the trade-off for having a O(n)-time sorting method). It can detect an already sorted input sequence based on data from the previous pass. Its output is not the sorted sequence but a positions table. So you will need another pass or an additional indirection when referencing the input data. The former is possible to be done in-place, like this:

const std::vector<unsigned int> & positions = rsort_engine.pos();

// in-place permutation of the input
T tmp, prev = T();
unsigned int p = 0;
for(unsigned int i = 0; i < array<T>.size(); ++i)
{
  tmp = array<T>[p];
  array<T>[p] = prev;

  p = positions[p];

  prev = array<T>[p];
  array<T>[p] = tmp;
}

Why is O(n lg n) the limit for comparison-based sorting methods? To answer that, imagine an extended binary tree with external nodes being a random sequence of keys as the root, and all possible combinations of these keys in sorted order as the leaves. Each internal node represents exactly one comparison of two keys and its left and rights subtrees represent the subsequent comparisons to be made if those keys compared less or greater. The tree is not complete because we are only interested in the minimum number of comparisons needed to sort a sequence. This way we can assume that by knowing the results of, say, k2 < k3 and k3 < k4, we don't have to compare k2 and k4 (transitive < is assumed.) All internal nodes of the comparison tree that are at level m would generate at most 2m external nodes. If S(n) denotes the minimum number of comparisons that will suffice to sort n elements, then we can assume that S(n) = m (the length of path from the root.) 2S(n) might be higher than the actual number of all external nodes which is n! (a permutation of n elements.) All this gives rise to:

We are interested in S(n), so

and by using James Stirling's approximation

we can get what we were told by our teachers:

References:

[1] Terdiman, P., Radix Sort Revisited
[2] Knuth, D.E., The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition

© 2002-4 Marek Trefny