The Photonization of about:telemetry

This summer I mentored :flyingrub for a Google Summer of Code project to redesign about:telemetry. You can read his Project Submission Document here.

Background

Google Summer of Code is a program funded by Google to pay students worldwide to contribute in meaningful ways to open source projects.

about:telemetry is a piece of Firefox’s UI that allows users to inspect the anonymous usage data we collect to improve Firefox. For instance, we look at the maximum number of tabs our users have open during a session (someone or several someones have more than one thousand tabs open!). If you open up a tab in Firefox and type in about:telemetry (then press Enter), you’ll see the interface we provide for users to examine their own data.

Mozilla is committed to putting users in control of their data. about:telemetry is a part of that.

Then

When :flyingrub started work on about:telemetry, it looked like this (Firefox 55):

oldAboutTelemetry

It was… functional. Mostly it was intended to be used by developers to ensure that data collection changes to Firefox actually changed the data that was collected. It didn’t look like part of Firefox. It didn’t look like any other about: page (browse to about:about to see a list of about: pages). It didn’t look like much of anything.

Now

After a few months of polishing and tweaking and input from UX, it looks like this (Firefox Nightly 57):

newAboutTelemetry

Well that’s different, isn’t it?

It has been redesigned to follow the Photon Design System so that it matches how Firefox 57 looks. It has been reorganized into more functional groups, has a new top-level search, and dozens of small tweaks to usability and visibility so you can see more of your data at once and get to it faster.

newAboutTelemetry-histograms.png

Soon

Just because Google Summer of Code is done doesn’t mean about:telemetry is done. Work on about:telemetry continues… and if you know some HTML, CSS, and JavaScript you can help out! Just pick a bug from the “Depends on” list here, and post a comment asking if you can help out. We’ll be right with you to help get you started. (Though you may wish to read this first, since it is more comprehensive than this blog post.)

Even if you can’t or don’t want to help out, you can take sneak a peek at the new design by downloading and using Firefox Nightly. It is blazing fast with a slick new design and comes with excellent new features to help be your agent on the Web.

We expect :flyingrub will continue to contribute to Firefox (as his studies allow, of course. He is a student, and his studies should be first priority now that GSoC is done), and we thank him very much for all of his good work this Summer.

:chutten

Advertisements

Data Science is Hard: History, or It Seemed Like a Good Idea At the Time

I’m mentoring a Summer of Code project this summer about redesigning the “about:telemetry” interface that ships with each and every version of Firefox.

The minute the first student (:flyingrub) asked me “What is a parent payload and child payload?” I knew I was going to be asked a lot of questions.

To least-effectively answer these questions, I’ll blog the answers as narratives. And to start with this question, here’s how the history of a project makes it difficult to collect data from it.

In the Beginning — or, rather, in the middle of October 2015 when I was hired at Mozilla (so, at my Beginning) — there was single-process Firefox, and all was good. Users had many tabs, but one process. Users had many bookmarks, but one process. Users had many windows, but one process. All this and the web contents themselves were all sharing time within a single construct of electrons and bits and code and pixels: vying with each other for control of the filesystem, the addressable space of RAM, the network resources, and CPU scheduling.

Not satisfied with things being just “good”, we took a page from the book penned by Google Chrome and decided the time was ripe to split the browser into many processes so that a critical failure in one would not trouble the others. To begin with, because our code is venerable, we decided that we would try two processes. One of these twins would be in charge of the browser and the other would be in charge of the web contents.

This project was called Electrolysis after the mechanism by which one might split water into Hydrogen and Oxygen using electricity.

Suddenly the browser became responsive, even in the face of the worst JavaScript written by the least experienced dev at the most privileged startup in Silicon Valley. And out-of-memory errors decreased in frequency because the browser’s memory and the web contents’ memory were able to grow without interfering with each other.

Remember, our code is venerable. Remember, our code hearkens from its single-process past.

Our data-collection code was written in that single-process past. But we had two processes with input events that need to be timed to find problems. We had two processes with memory allocations that need to be examined for regressions.

So the data collection code was made aware that there could be two types of process: parent and child.

Alas, not just one child. There could be many child processes in a row if some webpage were naughty and brought down the child in its anger. So the data collection code was made aware there could be many batches of data from child processes, and one batch of data from parent processes.

The parent data was left looking like single-process data, out in the root of the data collection payload. Child processes’ data were placed in an array of childPayloads where each payload echoed the structure of the parent.

Then, not content with “good”, I had to come along in bug 1218576, a bug whose number I still have locked in my memory, for good or ill.

Firefox needs to have multiple child processes of different types, simultaneously. And many of some of those several types, also simultaneously. What was going to be a quick way to ensure that childPayloads was always of length 1 turned into a months-long exercise to put data exactly where we wanted it to be.

And so now we have childPayloads where the “weird” content child data that resists aggregation remains, and we also have payload.processes.<process type>.* where the cool and hip data lives: histograms, scalars, and keyed variants of both.

Already this approach is showing dividends as some proportions of Nightly users are getting a gpu process, and others are getting some number of content processes. The data files neatly into place with minimal intervention required.

But it means about:telemetry needs to know whether you want the parent’s “weird” data or the child’s. And which child was that, again?

And about:telemetry also needs to know whether you want the parent’s “cool” data, or the content child’s, or the gpu child’s.

So this means that within about:telemetry there are now five places where you can select what process you want. One for “weird” data, and one for each of the four kinds of “cool” data.

Sadly, that brings my storytelling to a close, having reached the present day. Hopefully after this Summer’s Code is done, this will have a happier, more-maintainable, and responsively-designed ending.

But until now, remember that “accident of history” is the answer to most questions. As such it behooves you to learn history well.

:chutten

Privileged to be a Mozillian

Mike Conley noticed a bug. There was a regression on a particular Firefox Nightly build he was tracking down. It looked like this:

A time series plot with a noticeable regression at November 6

A pretty big difference… only there was a slight problem: there were no relevant changes between the two builds. Being the kind of developer he is, :mconley looked elsewhere and found a probe that only started being included in builds starting November 16.

The plot showed him data starting from November 15.

He brought it up on irc.mozilla.org#telemetry. Roberto Vitillo was around and tried to reproduce, without success. For :mconley the regression was on November 5 and the data on the other probe started November 15. For :rvitillo the regression was on November 6 and the data started November 16. After ruling out addons, they assumed it was the dashboard’s fault and roped me into the discussion. This is what I had to say:

Hey, guess what's different between rvitillo and mconley? About 5 hours.

You see, :mconley is in the Toronto (as in Canada) Mozilla office, and Roberto is in the London (as in England) Mozilla office. There was a bug in how dates were being calculated that made it so the data displayed differently depending on your timezone. If you were on or East of the Prime Meridian you got the right answer. West? Everything looks like it happens one day early.

I hammered out a quick fix, which means the dashboard is now correct… but in thinking back over this bug in a post-mortem-kind-of-way, I realized how beneficial working in a distributed team is.

Having team members in multiple timezones not only provided us with a quick test location for diagnosing and repairing the issue, it equipped us with the mindset to think of timezones as a problematic element in the first place. Working in a distributed fashion has conferred upon us a unique and advantageous set of tools, experiences, thought processes, and mechanisms that allow us to ship amazing software to hundreds of millions of users. You don’t get that from just any cube farm.

#justmozillathings

:chutten

All Aboard the Release Train!

I’m working on a dashboard (a word which here means a website that has plots on it) to show Firefox crashes per channel over time. The idea is to create a resource for Release Management to figure out if there’s something wrong with a given version of Firefox with enough lead time to then do something about it.

It’s entering its final form (awaiting further feedback from relman and others to polish it) and while poking around (let’s call it “testing”) I noticed a pattern:auroramcscrashes

Each one of those spikes happens on a “merge day” when the Aurora channel (the channel powering Firefox Developer Edition) updates to the latest code from the Nightly channel (the channel powering Firefox Nightly). From then on, only stabilizing changes are merged so that when the next merge day comes, we have a nice, stable build to promote from Aurora to the Beta channel (the channel powering Firefox Beta). Beta undergoes further stabilization on a slower schedule (a new build every week or two instead of daily) to become a candidate for the Release channel (which powers Firefox).

For what it’s worth, this is called the Train Model. It allows us to ship stable and secure code with the latest features every six-to-eight weeks to hundreds of millions of users. It’s pretty neat.

And what that picture up there shows is that it works. The improvement is especially noticeable on the Aurora branch where we go from crashing 9-10 times for every thousand hours of use to 3-4 times for every thousand hours.

Now, the number and diversity of users on the Aurora branch is limited, so when the code rides the trains to Beta, the crash rate goes up. Suddenly code that seemed stable across the Aurora userbase is exposed to previously-unseen usage patterns and configurations of hardware and software and addons. This is one of the reasons why our pre-release users are so precious to us: they provide us with the early warnings we need to stabilize our code for the wild diversity of our users as well as the wild diversity of the Web.

If you’d like to poke around at the dashboard yourself, it’s over here. Eventually it’ll get merged into telemetry.mozilla.org when it’s been reviewed and formalized.

If you have any brilliant ideas of how to improve it, or find any mistakes or bugs, please comment on the tracking bug where the discussion is currently ongoing.

If you’d like to help in an different way, consider installing Firefox Beta. It should look and act almost exactly like your current Firefox, but with some pre-release goodies that you get to use before anyone else (like how Firefox Beta 50 lets you use Reader Mode when you want to print a web page to skip all the unnecessary sidebars and ads). By doing this you are helping us ship the best Firefox we can.

:chutten

The Future of Programming

Here’s a talk I watched some months ago, and could’ve sworn I’d written a blogpost about. Ah well, here it is:

Bret Victor – The Future of Programming from Bret Victor on Vimeo.

It’s worth the 30min of your attention if you have interest in programming or computer history (which you should have an interest in if you are a developer). But here it is in sketch:

The year is 1973 (well, it’s 2004, but the speaker pretends it is 1973), and the future of programming is bright. Instead of programming in procedures typed sequentially in text files, we are at the cusp of directly manipulating data with goals and constraints that are solved concurrently in spatial representations.

The speaker (Bret Victor) highlights recent developments in the programming of automated computing machines, and uses it to suggest the inevitability of a very different future than we currently live and work in.

It highlights how much was ignored in my world-class post-secondary CS education. It highlights how much is lost by hiding research behind paywalled journals. It highlights how many times I’ve had to rewrite the wheel when, more than a decade before I was born, people were prototyping hoverboards.

It makes me laugh. It makes me sad. It makes me mad.

…that’s enough of that. Time to get back to the wheel factory.

:chutten