After listening to the Lucien C.H. Lin’s talk at Coscup 2019 about his experience on working with corporates dealing with FLOSS communities and licenses, I learn more about the practice in the industry. It also makes me to think about the license I would be using for my open source projects. This article from Neil Mitchell in Haskell community explained his stance very well, and by carefully reading his article I would tend to agree with his intent. I would be inclined to change all of my projects to be Apache-2.0 and BSD-3 dual license so that it has patent grant protection and at the same time compatible with GPL. Rust is using Apache-2.0 and MIT dual license but to my personal I like the clause in BSD-3 where it requires you can’t use my name to endorse things you build, that makes sense to me. I would relicense my projects one by one from now on.
go-pdqsort is my implementation of pattern defeating sort in golang. I knew about pattern defeating sort from rust’s standard library documentation. I’ve never heard about this algorithm and it immediately intrigues my interest. I googled about it and found the original implementation and their discussion threads on hacker news and reddit. Then I read the technical report from Orson Peters. The key observation from the algorithm is that merely reducing the miss rate from CPU branch prediction, it would make the sorting speed greatly improved (no need to flush the cache line etc), so the technique adopted was to put the index of the array in the buckets and don’t swap them immediately, but put them in the right bin and swap them in one batch at the end of the loop. This works perfectly in the language with zero cost abstraction like C/C++ and Rust. I wasn’t sure about that would work in the heap-managed languages like golang, python and ruby etc, since most of the things are on the heap (with pointer indirection) and often results into cache misses, the impact of branch prediction miss rate might be neglectable. (For sure it would depend on the escape analysis in golang compiler, it would put the things on stack if it doesn’t escape in general). Though my hunch was that it would be slower, it still a good practice to implement the algorithm by myself so that I would get to know the details of the algorithm. And here are the results
|std-lib sort||69828 ns/op|
I am not sure about what std-lib’s sort is, but it looks like introsort since it switches to heapsort when the recursion is too deep, and it incorporates the techniques from shellsort as well, but the main part is still Bently’s quicksort technique. You can tell from the result that pdqsort is significantly slower than std-lib’s sort, probably due to the overhead of those memory batch swap, where it triggers extra allocation or cache eviction, pretty much as expected.
If you are interested in the algorithm, you could check out the code by yourself.