Optimization

"Premature optimization is the root of all evil." - Donald Knuth

This is one of the most notable quotes from my early days in programming. When I started on this path, I took the same approach that probably many do - I worked hard to ensure the simple programs that I wrote were efficient, and that I didn't have any redundancy. This quote stood out and made me rethink that pursuit. Still, optimization isn't a bad thing, it just needs to be done at the right time and in the right manner.

Lately I've been looking into ways to optimize some aspects of Privacy Badger. The code base isn't huge, but it isn't small either, so optimizations aren't always as straightforward as one would hope. We knew that one page in particular was small - the options page. As the size of the data set grew, the loading time of this page increased linearly. That's not good - some people had reported nearly ten thousand third party domains in their database. Each element took up to tens of milliseconds to load, resulting in a load time of 18 seconds for a data set of 1,200 elements.

One suggestion from another project collaborator was lazy loading, such that only a fixed number of elements would be populated on the initial load, and as a user scrolled down others would also be loaded. This turned out to be an excellent suggestion, and his implementation resulted in a drop to around 1.5 to 2 seconds for the initial page load. This implementation had it so that additional elements loaded one at a time as the user scrolled. Problem solved, right? Well, not necessarily. I thought the UI might be improved by batching the loading such that the scroll bar jumped up noticeably. When I tried this quickly, it turned out to be too slow, introducing a noticeable lag.

So, I opened up the profiler in Chrome and started fiddling around. A huge amount of time was spent on something called "Parse HTML", and another huge amount of time on something called "Recalculate Style". I figured it was HTML-related, so I commented out huge sections of code that generated the HTML to see which elements were the primary culprits. From there, I set a baseline - without the HTML rendering it took 0.4ms for parsing and 0.4ms for recalculating for each element. Then I began the meticulous process of checking each line in the profiler, one by one. The results were interesting, though not entirely assuring. The major places for optimization aren't easy, and are caused by the loading of certain elements multiple times in the UI. Each element needs a slider, and 1,000 sliders in the DOM is bound to slow something down.

Still, I did find something. Loading each element individually meant that the HTML for all elements had to be parsed again on each addition. I thought batching the loads would be a more appropriate approach, as the more batching we do, the less we have to parse the entire HTML, but the way I'd originally done the batching hadn't actually improved on anything - it just lumped calls. I made it so that the subset of elements was compiled then only added to the DOM in one batch. Boy did it make a difference.

Still, finding a sweet spot was a tricky task: too few elements in each batch and the collective savings are low, too many and the user experiences noticeable lag. I tried several different batch sizes, and found that somewhere between 10 and 20 elements seemed to be the sweet spot - the delay was manageable enough to not feel slow, and the bar jumped enough to be obvious.

The full set of optimizations aren't necessarily done yet, but things are drastically improved from where they have been, and the small amount of effort put into optimizing the key bottlenecks has proved to be one of the better uses of our our collective time.