## Compressing modifications to permutations

*Tags:* math

Let’s say that you have two permutations \pi and \sigma where both permutations are *roughly equal* (“roughly equal” is an intentionally vague term). You may want to *compress* the modifications to the permutation, formally:

- Given \pi and \sigma which are permutations on n elements, we want a
*compression algorithm*C(\pi, \sigma) which outputs a “small” bitstring. This should be paired with… - …a decompression algorithm, such that D(\pi, C(\pi, \sigma)) = \sigma.

This problem sounds esoteric, but it’s actually relatively common. For example, in real time collaborative editing, one user may rearrange items in an array. To propagate the edits to another user, we’ll want to send them instructions about how to reorder their list in the same way. One way to solve this is to send the result of the compression algorithm.

The most common way I see this solved is via sending a series of instructions of the form “move the element at index x to index y”. Of course, such a representation may not be the most efficient, depending on the specifics of how many such swaps occur. In particuar, this takes 2k \log_2 n bits to encode k swaps on an n element permutation.

Another simple method is to define C(\pi, \sigma) to be the Lehmer encoding of \sigma, and then to decompress by decoding the Lehmer encoding. This works, but it takes \Theta(n\ \log\ n) bits and completely ignores \pi. Information theoretic bounds tell us that we can’t do better than \log (n!) = \Theta(n\ \log\ n) in the average case, which this method does achieve, but we want something which works well when \pi and \sigma are roughly equal. Can we do better?

## Compression from Sorting Algorithms

Consider a deterministic comparison sorting algorithm on the set S. The sorting algorithm requires a comparison relation <(a, b) : S \times S \to \{0,1\}, which returns 1 if the first input is less than the second and 0 otherwise.

From here on, we’ll talk about permutations as if they are expressed in one-line notation. For concreteness, we’ll use \pi = (1\ 2\ 3) and \sigma = (1\ 3\ 2).

We can construct a compression algorithm from any comparison sorting algorithm. Define <_{\sigma}(a,b) as 1 iff a appears before b in \sigma. Note that applying a sorting algorithm with <_{\sigma} to \pi will result in \sigma.

The compression algorithm is surprisingly simple: record the results of <_{\sigma} as you do the sorting algorithm. The result is simply a bitstring. Let’s see an example with our concrete values of \pi = (1\ 2\ 3) and \sigma= (1\ 3\ 2) and using insertion sort:

- Insertion sort looks at 2 and 1. Compare <_{\sigma}(2,1)=0 since 2 is not before 1 in \sigma. Insertion sort concludes these two elements are in the correct order.
- Insertion sort looks at 3 and 2. Compare <_{\sigma}(3,2)=1 since 3 is before 2 in \sigma. Insertion sort swaps these two elements, now \pi = (1\ 3\ 2).
- Insertion sort looks at 3 and 1. Compare <_{\sigma}(3,1)=0 since 3 is not before 1 in \sigma.
- Insertion sort looks at 2 and 3. Compare <_{\sigma}(2,3)=0 since 2 is not before 3 in \sigma.

If we record the results of <_{\sigma}, we get C(\pi, \sigma) = [0, 1, 0, 0]. Now our decompression algorithm can re-run the sorting algorithm by while using the results of the compression algorithm instead of \sigma itself.

The efficiency of this method depends on the exact sorting algorithm used, and adaptive sorting algorithms like insertion sort can perform particularly well in some cases. Insertion sort runs in time \Theta(n + k) where k is the number of inversions. Since the runtime of inversion sort is \Theta(n + k), so is the number of comparisons performed, and so the output can be significantly smaller than \Theta(n\ \log\ n) for some cases!

When we said “roughly equal” in the introduction, the definition is essentially arbitrary – different adaptive sorting algorithms will provide different results for different data sets. There’s no compression algorithm which is best for all use cases, just like there’s no generally best sorting algorithm.

## Real-world implementation

We can also run an ordinary compression algorithm, such as zlib or run-length encoding, on the result of C. These often seem to help in practice. Here are some real-world results on a benchmark with n=3255, expressed in terms of “number of bits required for compression output divided by n”:

method | efficiency |
---|---|

theoretical lower bound | 10.2 bits/n |

merge sort + zlib | 1.16 bits/n |

binary insertion sort + zlib | 1.29 bits/n |

shellsort + zlib | 2.78 bits/n |

timsort + zlib | 1.14 bits/n |

The best result is around nine times lower than the theoretical lower bound!

Here’s some Python code implementing the core algorithms discussed:

```
from functools import total_ordering
def make_logging_cmp(cmp):
= []
log def logging_cmp(a, b):
= cmp(a, b)
result
log.append(result)return result
return logging_cmp, log
def make_cmp_from_permutation(sigma):
= {v: k for k, v in enumerate(sigma)}
sigma_inv return lambda a, b: sigma_inv[a] < sigma_inv[b]
def make_cmp_from_log(log):
= 0
i def cmp(_a, _b):
nonlocal i
= log[i]
result += 1
i return result
return cmp
def timsort(cmp, pi):
@total_ordering
class Number:
def __init__(self, i):
self.i = i
def __lt__(self, other):
return cmp(self.i, other.i)
return sorted([Number(x) for x in pi])
= [1, 3, 2]
sigma = [1, 2, 3]
pi cmp = make_cmp_from_permutation(sigma)
= make_logging_cmp(cmp)
logging_cmp, log
timsort(logging_cmp, pi)print('compression result', log)
# outputs [False, True, True, False]
= make_cmp_from_log(log)
decompression_cmp = [x.i for x in timsort(decompression_cmp, pi)]
sigma2 print('decompression works', sigma2 == sigma)
# outputs True
```