Optimization

"Premature optimization is the root of all evil." - Donald Knuth

This is one of the most notable quotes from my early days in programming. When I started on this path, I took the same approach that probably many do - I worked hard to ensure the simple programs that I wrote were efficient, and that I didn't have any redundancy. This quote stood out and made me rethink that pursuit. Still, optimization isn't a bad thing, it just needs to be done at the right time and in the right manner.

Lately I've been looking into ways to optimize some aspects of Privacy Badger. The code base isn't huge, but it isn't small either, so optimizations aren't always as straightforward as one would hope. We knew that one page in particular was small - the options page. As the size of the data set grew, the loading time of this page increased linearly. That's not good - some people had reported nearly ten thousand third party domains in their database. Each element took up to tens of milliseconds to load, resulting in a load time of 18 seconds for a data set of 1,200 elements.

One suggestion from another project collaborator was lazy loading, such that only a fixed number of elements would be populated on the initial load, and as a user scrolled down others would also be loaded. This turned out to be an excellent suggestion, and his implementation resulted in a drop to around 1.5 to 2 seconds for the initial page load. This implementation had it so that additional elements loaded one at a time as the user scrolled. Problem solved, right? Well, not necessarily. I thought the UI might be improved by batching the loading such that the scroll bar jumped up noticeably. When I tried this quickly, it turned out to be too slow, introducing a noticeable lag.

So, I opened up the profiler in Chrome and started fiddling around. A huge amount of time was spent on something called "Parse HTML", and another huge amount of time on something called "Recalculate Style". I figured it was HTML-related, so I commented out huge sections of code that generated the HTML to see which elements were the primary culprits. From there, I set a baseline - without the HTML rendering it took 0.4ms for parsing and 0.4ms for recalculating for each element. Then I began the meticulous process of checking each line in the profiler, one by one. The results were interesting, though not entirely assuring. The major places for optimization aren't easy, and are caused by the loading of certain elements multiple times in the UI. Each element needs a slider, and 1,000 sliders in the DOM is bound to slow something down.

Still, I did find something. Loading each element individually meant that the HTML for all elements had to be parsed again on each addition. I thought batching the loads would be a more appropriate approach, as the more batching we do, the less we have to parse the entire HTML, but the way I'd originally done the batching hadn't actually improved on anything - it just lumped calls. I made it so that the subset of elements was compiled then only added to the DOM in one batch. Boy did it make a difference.

Still, finding a sweet spot was a tricky task: too few elements in each batch and the collective savings are low, too many and the user experiences noticeable lag. I tried several different batch sizes, and found that somewhere between 10 and 20 elements seemed to be the sweet spot - the delay was manageable enough to not feel slow, and the bar jumped enough to be obvious.

The full set of optimizations aren't necessarily done yet, but things are drastically improved from where they have been, and the small amount of effort put into optimizing the key bottlenecks has proved to be one of the better uses of our our collective time.

"Good" code

Before I officially started programming, I'd heard horror stories not just on the internet but from my own coworkers about the challenges they faced reading poorly written spaghetti code with meaningless variable names. At the time it was often difficult for me to look at the code and know why it was bad; the syntax was strange, and even beautiful code would have looked elusive. Still, I'd witnessed enough frustration as a result of "bad" code to realize that the way code was written - not just what it did - was important.

When I started programming, I decided that I would make a determined effort with every piece of code I wrote, whether for an assignment, for practice or for production, would have properly named variables. Seemed like an easy enough place to start, right? It turned out to be much harder than I'd originally expected. "Why can't I just write 'var x = y;'? It's only for a practice problem and nobody's going to see it anyway..." The urge was there, though I'd made up my mind, and I wasn't going to let myself slip into that bad habit.

What I've found is that little things can make all the difference when it comes to code readability. Properly named variables and functions, consistent spacing, consistent capitalization, batching of declarations, etc. Each one doesn't take a huge amount of time, yet without them what was once readable code turns into an absolute mess that is not only difficult to follow, but distracting for the reader. The number of times I've seen node* root on one line and node *left on another is too many to count. I have my opinions on where the `*` should be placed, but it really doesn't matter so long as the code is consistent.

I'm fortunate that I haven't yet worked as a developer on a project that has a huge code base with bad coding style. At the same time, I certainly can't say that I've ever worked on a living breathing project where inconsistencies existed. Proper coding style is something that's worth the extra overhead. Developers may not like the strict linting rules that result in them having to spend extra time on each feature branch getting the code up to snuff, but in the end I can't say I've ever met a developer who didn't appreciate well-written and readable code.

Making the Web More Secure

I'm not a perfectionist. I like it when things work well. But I'm okay with something that isn't perfect. Perfection is not an easy goal to achieve, nor in many cases is it realistic. Since earlier this year, I have been focusing heavily on better understanding the current security and privacy landscapes as it relates to the internet. This includes investigating secure communication protocols, encryption and obfuscation of transported data, tracking mechanisms, snooping methods, proxies, relays, VPNs, content injection, and more.

During this time, I've also come across two very common ideas. One is that security is hard. The other is that many people take a perfectionist's approach to security. The former is undeniably true, and anyone who says otherwise either doesn't understand the nature of the issues deeply enough, or needs to tame their ego slightly. The latter, however, is more dubious is nature.

Is perfection a reasonable goal in securing the web? I would argue it isn't. Desirable? Sure. Effective? Absolutely. Achievable? Perhaps not just yet. For security to be effective it needs to combine with usable technologies and interfaces. Real people simply aren't willing to give up a large amount of convenience for more (or any) security. 

That's why I decided to contribute to Privacy Badger. Privacy Badger + Chrome isn't even close to as secure as something like Tor Browser, but it is more usable, and more accessible. Tor is amazing; I display Tor stickers on my laptop with great pride, and take advantage of its services regularly. At the same time, Tor isn't a mass market solution - and it really doesn't aim to be one. It's a specialized service that meets a very important need for a lot of people. Something like Privacy Badger serves a different purpose - it aims to take a small subset of what Tor offers and bring it to the users in a smaller package.

I'm extremely glad that amazingly smart people keep working on the state of the art in security, pushing the bar higher and revealing more types of vulnerabilities. At the same time, I'm encouraged to see large organizations making headway towards providing security for the masses, such as Let's Encrypt providing free self-signed certificates, the Google Chrome team who is reinventing security warnings, and even Squarespace's announcement of implementing TLS encryption for all their hosted sites. Not all security implementations need be perfect from the start, and it's the combination of these two that will allow us to create a more secure web.

Web Tracking

It's been around three months now since I made my first foray into the wide world of web tracking. I think I've read around 20 papers so far on the topic, yet I still find myself regularly coming across new information to further deepen my knowledge on the topic.

I've always been a privacy advocate; we desire privacy in the physical world, and the digital world is no different. We don't all have the level of privacy that we desire in the physical world, though more often than not we're starkly aware of where such privacy is being infringed upon. The digital world is an entirely different landscape. Here are some things that have particularly stuck out at me:

  1. Tracking is everywhere.
  2. The vast majority of internet users are unaware of the tracking going on.
  3. Defending against tracking is not trivial.

There are two main branches of tracking: first-party tracking, and third-party tracking.

First-party tracking is when you visit a website, and that website tracks you as you navigate their website. The simple (and almost robust) way to avoid first-party tracking is to not visit specific sites. Don't trust the New York Times? Simply don't visit their site.

Third-party tracking, on the other hand, is where all the fun happens. Third-party sites are called constantly. Visit something like nytimes.com, and resources from two dozen other domains are loaded, providing tools such as sharing buttons, A/B testing, advertisements, analytics and more. They're not all trackers, but many of them are, and they're loaded silently without explicitly asking you and without specifically notifying you.

These third-party sites have at their disposal a number of techniques they can use to 'mark' you and watch you as you browse different sites that they are loaded on. That is what I'm investigating further. It's not easy, but it is supremely interesting, and for a privacy advocate like myself, it's something that needs to be understood and communicated to the public better. Here's to hoping.

 

Perseverance

It's probably no surprise to anyone who's ever tried to write code that programming is never easy. Since I started programming 18 months ago, I've encountered this realization time and time again. Whether it's figuring out a new framework, understanding a new project, refactoring an existing codebase, or simply writing something that works as intended, it's an activity that is not for the faint of heart.

Early on, I thought naively that as I learned more I would run into issues less and less often. That couldn't have been further from reality. Eighteen months later, and now just over 6 months away from graduating, I feel it more than ever. The problems I was dealing with during my first few months in school weren't easy by any stretch of the imagination. For a beginner, some were extremely challenging. A couple in particular were most definitely beyond my capability at the time.

This past week has been an exercise in perseverance. Perhaps no different from the week before it, nor any different from what this week will hold. I've become somewhat accustomed to this lifestyle now, and have grown to appreciate it greatly. Being challenged and pushed beyond my abilities has been by far the most important element in my progress over these 18 months. Don't get me wrong - there are periods of intense frustration that occur when attempting to wrangle technical challenges. I don't particularly enjoy these periods, but they are a necessary step in experiencing the thrill and excitement that comes in the end. That's what makes it all worth it.

Yet here I am again, absolutely puzzled, completely stuck, facing what seems like it should be an easy fix yet having made little headway. Fortunately I do have some people to reach out to, and that has been a critical part of the whole process. I've found the developer community to be great; perhaps because those who are giving advice know so well the frustration and defeat that occur so frequently. What I do know is that eventually things will click. Persist long enough, and a solution will reveal itself. Now, though, it's time to take a break and let things churn in the background. Sometimes throwing ever more hours at a problem isn't the way to discover its solution.

Gauging progress

I find it to be one of the great challenges in life: gauging progress. I experienced it a lot during my multi-year pursuit towards fluency in Korean. It always seems like there's a shortage of time to learn things deeply. I notice this ever so frequently now as I immerse myself in the world of all things programming.

This past week provided me with many opportunities to not only realize the numerous gaps in my own knowledge, but to pick the brains of several different individuals with a modicum (or more) of expertise in their relative fields. Starting with a trip down to Seattle to spend the weekend with a group of Tor Project developers and fans, continuing with a deep discussion about the fundamental research questions that remain unanswered in the space of web tracking, and ending with a personal tour of my university's network operations center (and yes, it's insane).

Wanting to challenge myself more, I decided that the next task I'll be taking on in Privacy Badger is exploring WebRTC. WebRTC is an interesting protocol used by many peer-to-peer web-based services (think Google Hangouts, etc.) which makes it easier for people to interact with others in real time. It does, however, have some weak points. In particular, there is a way in which the local IP address, even when being loaded over a VPN or the Tor network, can be visible to the other side. In the Privacy Badger project, we've been talking about this, trying to find a way to provide better protection against WebRTC fingerprinting while simultaneously allowing users to enjoy the benefits of WebRTC for useful web tools like Google Hangouts.

Over the next week, I've got some exciting things lined up. Firstly, I have to read a bunch of papers for my research project to better understand the current state of the art and the extent of current research in the area. I'm discovering quickly that finding an appropriate research question is no small task. Secondly, I've got a bunch more reading and hacking to do in order to figure out how to handle the WebRTC related functionality. Thirdly, I need to curate some material for the Korean learners that I'm tutoring. It shouldn't be too hard to write up a few simple dialogues to get the ball rolling. Fourthly, I agreed to join a book club a few weeks ago, and our book is "For Whom The Bell Tolls". It's turning out to be an awesome story, and I'm really enjoying reading it on the commute to school.

With all this, though, the question is always how to effectively gauge my progress. It's hard to see any improvement when I constantly encounter frameworks, tools, projects, etc. that I have no experience with. The barriers to entry often seem insurmountable, especially as I've been working closely with some very experienced people. What I've found, however, to be a good metric, is how I compare with myself from six months ago. Six months is enough to make noticeable progress, and when I look back to six months ago and think what I knew then, I'm happy with how far I've come. Certainly not enough to rest on my laurels (though I doubt that day will ever come), but enough to know that I'm moving in the right direction. Oh, and that all this programming stuff takes time.

Diving deep

One of my favorite things to do is dive deep into something that I don't understand and wrestle with it. The harder you wrestle, the more satisfying it is when things finally click. When I wasn't working in tech, this was a somewhat infrequent experience. As a programmer, it seems to be something I encounter at least once a day. And I love it.

I've now joined Privacy Badger in a more "official" sense, having been given write access to the main GitHub repository and participating in the semi-weekly developer meetings. Poking around Privacy Badger's code base has been fascinating, not only to see what's been done, but to see how it's been done. It's so easy to underestimate things you don't understand. I make this mistake regularly, and am as susceptible to the Dunning-Kruger effect as anyone.

I've also been working on finding a good way (along with another PB contributor) to measure the effectiveness of the heuristic algorithm used to detect trackers. If we had a list of every tracker out there, it would be easy to establish our catch rate, but unfortunately for everyone such a thing doesn't exist. Instead, I've spent time looking for tools/platforms that would iterate over a set number of websites (e.g. the Alexa Top 10,000) and output the results, including which accesses were blocked and which cookies were set. I stumbled across OpenWPM, which seems to be a great tool for this. I've got it iterating over the Alexa Top 1,000 in the lab right now, and am curious to see the results. No doubt it will require some iteration.

When I'm not working on Privacy Badger or doing coursework for my classes, I try feverishly to explore a variety of interesting topics relating to security, privacy and the internet. It all fascinates me, and I find it quite difficult to put the stuff down. Thankfully, I do have a new distraction - one that should keep me a bit more balanced. I volunteer as a Korean tutor for a campus club. It involves about an hour a week of time spent helping Korean learners with grammar, vocabulary, writing, etc. The club is just starting things up again for this new school year, so it'll be fun to teach Korean to some eager university students.

This next week should be interesting, not just because I'm headed down to Seattle over the weekend for a special event. Things are finally starting to fall into place, and I feel comfortable now with the process and the tools. I've settled in at the lab, established a good gym routine, figured out the right timing for hitting the grocery store on the way home, and have enough todos on my list to keep me busy for the next 6 months. Still, as one of my advisors so wisely quoted, "Plans are of little importance, but planning is essential."

 

The Setup

Over the last week, I've been working feverishly to get my various environments set up so I can start writing code, running experiments, gathering data and analyzing results. So far, it seems I've made little progress. Namely, I've barely started to do the first of those four, and haven't even yet touched the other three.

Starting always slow. There's a lot to learn, a lot to understand, and seemingly endless issues that appear that impede progress. I've been down this path before – I know that eventually the fog will start to lift and things will start to make sense. There are of course still moments of intense frustration and confusion, though I know that eventually – not necessarily tomorrow, or even next week, but eventually – it will become clear.

What I find fascinating about these new problems is that they can virtually all, with the exception of a small few, be solved with the same approach: methodical trial and error, a keen eye and lots of patience. Still, as much as I love tackling difficult problems head on, it's been a long week, and I'm ready for a few easy wins to take the edge off!

It's particularly difficult when getting started to feel unsatisfied with the progress you make. Setting up environments, learning tools, understanding processes. They all feel like overhead; time spent on uninteresting and unimportant tasks that don't make any impact. Even more so when you have a boss or supervisor who is pushing you to produce.

I think, more than anything, it's important to set realistic expectations. Getting set up takes time, and trying to rush it is only going to make what is already unpleasant that much worse. In the end, it will take as long as it takes, and sometimes things are out of your control. Hopefully you get to work with people who understand that.

On my end, the good news is that after struggling for approximately 5 hours with multiple hardware and OS installation issues, I finally got my machine set up in the lab. I even had a chance to run a demo of the analysis tool I'll hopefully be using for some benchmarking over the coming months. I'm no stranger to assembling computers, but today was one of those special days where nothing wanted to go right. With enough trial and error, a keen eye and lots of patience (not to mention a little help from my labmates), I managed to complete the setup phase. Now onto the real work... learning tools!

First day in the lab

Today was my first proper day in the Networks, Systems and Security Lab (NSS) at UBC. I'd picked a seat out yesterday afternoon, though by the time I got there most of my lab mates had already left for the evening. Today, I stopped by at 2:30pm and was pleased to meet several more of my lab mates. They're great people.

We're all working on different things, so there isn't a huge amount of collaboration by way of the projects themselves. Naturally, I assume much of the work they're doing is as of yet private in nature, so I won't be disclosing details about their specific research projects. That being said, there is a lot of cool stuff going on in the NSS lab.

I'll be honest, I didn't get a huge amount done today at the lab itself, though I think that's pretty normal when it's your first day meeting people that you'll be working alongside for the next 4-8 months. Instead, I was grateful to have the chance to chat with a bunch of lab mates about a wide variety of topics, including SSL/TLS certifications, Diffie-Hellman, MITM attacks, Tor routing, Paxos (which I pretended to understand), graduate school, real estate prices, tech salaries, and how we're going to rearrange some of the furniture in the lab to make it look nicer.

Over the past two weeks I've been working hard on preparing my proposal, followed by preparing a presentation for my advisors on the problem area, the existing solution space, and my proposed research impact. I didn't really expect to have to do this much research before formally starting, though I honestly don't mind it at all. Being challenged on various elements of the proposal has been extremely helpful in pushing me to explore and investigate both existing solutions and previous research done in the topic.

But what is the topic, you ask? Yes, of course. In a broad sense, I'm working in browser privacy and security. In a more specific sense, I'm looking at how we can better defend against various types of super cookies and device fingerprinting. Some day I might post my proposal here, though since it's bound to be iterated upon, what I might do instead is write a series of posts about the different aspects of these two fascinating yet often diabolical techniques.

I don't really expect anyone in the NSS lab to have a huge amount to offer by way of knowledge of the problem area. However, I am encouraged to have the opportunity to sit down and chat regularly with a bunch of graduate students who are working on equally exciting projects, and who don't mind spending a few minutes talking about interesting topics in technology. Oh, and having my own desk on campus is nice, too!