Data Science is Hard: Units

I like units. Units are fun. When playing with Firefox Telemetry you can get easy units like “number of bookmarks per user” and long units like “main and content but not content-shutdown crashes per thousand usage hours“.

Some units are just transformations of other units. For instance, if you invert the crash rate units (crashes per usage hours) you get something like Mean Time To Failure where you can see how many usage hours there are between crashes. In the real world of Canada I find myself making distance transformations between miles and kilometres and temperature transformations between Fahrenheit and Celsius.

My younger brother is a medical resident in Canada and is slowly working his way through the details of what it would mean to practice medicine in Canada or the US. One thing that came up in conversation was the unit differences.

I thought he meant things like millilitres being replaced with fluid ounces or some other vaguely insensible nonsense (I am in favour of the metric system, generally). But no. It’s much worse.

It turns out that various lab results have to be communicated in terms of proportion. How much cholesterol per unit of blood? How much calcium? How much sugar, insulin, salt?

I was surprised when my brother told me that in the United States this is communicated in grams. If you took all of the {cholesterol, calcium, sugar, insulin, salt} out of the blood and weighed it on a (metric!) scale, how much is there?

In Canada, this is communicated in moles. Not the furry animal, but the actual count of molecules. If you took all of the substance out of the blood and counted the molecules, how many are there?

So when you are trained in one system to recognize “good” (typical) values and “bad” (atypical) values, when you shift to the other system you need to learn new values.

No problem, right? Like how you need to multiply by 1.6 to get kilometres out of miles?

No. Since grams vs moles is a difference between “much” and “many” you need to apply a different conversion depending on the molecular weight of the substance you are measuring.

So, yes, there is a multiple you can use for cholesterol. And another for calcium. And another for sugar, yet another for insulin, and still another for salt. It isn’t just one conversion, it’s one conversion per substance.

Suddenly “crashes per thousand usage hours” seems reasonable and sane.


Data Science is Hard: Client Delays

Delays suck, but unmeasured delays suck more. So let’s measure them.

I’ve previous talked about delays as they relate to crash pings. This time we’re looking at the core of Firefox Telemetry data collection: the “main” ping. We’ll be looking at a 10% sample of all “main” pings submitted on Tuesday, January 10th[1].

In my previous post on delays, I defined five types of delay: recording, submission, aggregation, migration, and query scheduling. This post is about delays on the client side of the equation, so we’ll be focusing on the first two: recording, and submission.

Recording Delay

How long does it take from something happening, to having a record of it happening? We count HTTP response codes (as one does), so how much time passes from that first HTTP response to the time when that response’s code is packaged into a ping to be sent to our servers?


This is a Cumulative Distribution Functions or CDF. The ones in this post show you what proportion (0% – 100%) of “main” pings we’re looking at arrived with data that falls within a certain timeframe (0 – 96 hours). So in this case, look at the red, “aurora”-branch line. It crosses the 0.9 y-axis line at about the 8 x-axis line. This means 90% of the pings had a recording delay of 8 hours or less.

Which is fantastic news, especially since every other channel (release and beta vying for fastest speeds) gets more of its pings in even faster. 90% of release pings have a recording delay of at most 4 hours.

And notice that shelf at 24 hours, where every curve basically jumps to 100%? If users leave their browsers open for longer than a day, we cut a fresh ping at midnight. Glad to see evidence that it’s working.

All in all it shows that we can expect recording delays of under 30min for most pings across all channels. This is not a huge source of delay.

Submission Delay

With all that data finally part of a “main” ping, how long before the servers are told? For now, Telemetry has to wait for the user to restart their Firefox before it is able to send its pings. How long can that take?



Now we see aurora is no longer the slowest, and has submission delays very similar to release’s submission delays.  The laggard is now beta… and I really can’t figure out why. If Beta users are leaving their browsers open longer, we’d expect to see them be on the slower side of the “Recording Delay CDF” plot. If Beta users are leaving their browser closed longer, we’d expect them to show up lower on Engagement Ratio plots (which they don’t).

A mystery.

Not a mystery is that nightly has the fastest submission times. It receives updates every day so users have an incentive to restart their browsers often.

Comparing Submission Delay to Recording Delay, you can see how this is where we’re focusing most of our “Get More Data, Faster” attentions. If we wait for 90% of “main” pings to arrive, then we have to wait at least 17 hours for nightly data, 28 hours for release and aurora… and over 96 hours for beta.

And that’s just Submission Delay. What if we measured the full client -> server delay for data?

Combined Client Delay


With things the way they were on 2017-01-10, to get 90% of “main” pings we need to wait a minimum of 22 hours (nightly) and a maximum of… you know what, I don’t even know. I can’t tell where beta might cross the 0.9 line, but it certainly isn’t within 96 hours.

If we limit ourselves to 80% we’re back to a much more palatable 11 hours (nightly) to 27 hours (beta). But that’s still pretty horrendous.

I’m afraid things are actually even worse than I’m making it out to be. We rely on getting counts out of “main” pings. To count something, you need to count every single individual something. This means we need 100% of these pings, or as near as we can get. Even nightly pings take longer than 96 hours to get us more than 95% of the way there.

What do we use “main” pings to count? Amongst other things, “usage hours” or “how long has Firefox been open”. This is imperative to normalizing crash information properly so we can determine the health and stability of a release.

As you can imagine, we’re interested in knowing this as fast as possible. And as things stood a couple of Tuesdays ago, we have a lot of room for improvement.

For now, expect more analyses like this one (and more blog posts like this one) examining how slowly or quickly we can possibly get our data from the users who generate it to the Mozillians who use it to improve Firefox.


[1]: Why did I look at pings from 2017-01-10? It was a recent Tuesday (less weekend effect) well after Gregorian New Year’s Day, well before Chinese New Year’s Day, and even a decent distance from Epiphany. Also the 01-10 is a mirror which I thought was neat.

What’s the First Firefox Crash a User Sees?

Growth is going to be a big deal across Mozilla in 2017. We spent 2016 solidifying our foundations, and now we’re going to use that to spring to action and grow our influence and user base.

So this got me thinking about new users. We’re constantly getting new users: people who, for one reason or another, choose to install and run Firefox for the first time today. They run it and… well, then what?

Maybe they like it. They open a new tab. Then they open a staggeringly unbelievable number of tabs. They find and install an addon. Or two.

Fresh downloads and installs of Firefox continue at an excellent pace. New people, every day, are choosing Firefox.

So with the number of new users we already see, the key to Growth may not lie in attracting more of them… it might be that we need to keep the ones we already see.

So what might stop a user from using Firefox? Maybe after they open the seventy-first tab, Firefox crashes. It just disappears on them. They open it again, browse for a little while… but can’t forget that the browser, at any time, could just decide to disappear and let them down. So they migrate back to something else, and we lose them.

It is with these ideas in my head that I wondered “Are there particular types of crashes that happen to new users? Do they more likely crash because of a plugin, their GPU misbehaving, running out of RAM… What is their first crash, and how might it compare to the broader ecosystem of crashes we see and fix every day?”

With the new data available to me thanks to Gabriele Svelto’s work on client-side stack traces, I figured I could maybe try to answer it.

My full analysis is here, but let me summarize: sadly there’s too much noise in the process to make a full diagnosis. There are some strange JSON errors I haven’t tracked down… and even if I do, there are too many “mystery” stack frames that we just don’t have a mechanism to figure out yet.

And this isn’t even covering how we need some kind of service or algorithm to detect signatures in these stacks or at least cluster them in some meaningful way.

Work is ongoing, so I hope to have a more definite answer in the future. But for now, all I can do is invite you to tell me what you think causes users to stop using Firefox. You can find me on Twitter, or through my email address.


Canadian Holiday Inbound! (Sunday December 25th, Observed on Monday the 26)

Hello team!

A Canadian Holiday is once again at our chimneys as Christmas Day is approaching! The holiday itself is on the 25th, but because that falls on a weekend, it’ll be on the 26th that you’ll not be finding us in the office. Please be understanding if your meetings are a little underpopulated, and take the opportunity to run all the try builds you can think of.

Now, I know what you’re thinking. (Well, I don’t, but since I’ve turned off comments you can’t correct me.) You are thinking that Christmas isn’t a Canadian holiday… or at least not uniquely Canadian.

And you wouldn’t be too wrong. Other parts of the world certainly celebrate Christmas. Japan does it up amazingly, for instance, if you’re ever in that corner of the globe in December. And still other parts of the world people celebrate all sorts of Winter festivities.

And it’s not as though we’re spending the day dipping our All-Dressed Chips[1] into Stanley Cups[2] of Maple Syrup[3] while taking our Coffee Crisp[4] and wearing our Toques[5], Jeans, Jean Shirts[6], and A Boot[7], eh?

No, but the Canadian way of celebrating Christmas _is_ unique… in that it’s usually done through celebrating everyone else’s Winter celebrations. Canadians are more than happy to adopt and support any culture or festival that involves food, fun, friends, and family.

My family tends to observe Polish Wigilia by eating pierogies, white fish, and bigos. My wife’s family has a Christmas Eve Feast of crab dip, fourteen types of frozen hors d’oeurves, cheese, crackers, and smoked oysters eaten on TV trays in front of Log: The Christmas Special. Earlier this month we ate latkes with sour cream and applesauce with pfeffernusse for afters at the Christkindle Markt. Last year we went to Sir John A. MacDonald’s birthplace at Woodside for soft gingerbread and roasted chestnuts.

Then there’s turkey with the trimmings for the more traditional, sushi for deliberate anti-traditional, and everything in-between.

So no matter if or how you celebrate Canadian Christmas, know that we are (and are not) celebrating it too, with you, in the Great White North.

Because anything else just wouldn’t be polite.

( :bwinton reminds me to tell you that we will also be off on the 27th for Boxing Day. Our most famous pugilists will be hard at work discouraging (in effigy) the normally-docile moose herds from invading the United States once again. So we’ll be busy cheering them on, sorry. )


[1] Tastes like… actually, I’m not really sure. Tasty, though.
[2] Named after Lord Stanley
[3] Probably harvested back in March in Quebec
[4] mocha-flavoured Nestle chocolate bar
[5] knit caps, often with pom-poms on top
[6] AKA “The Canadian Tuxedo”
[7] “about” pronounced in Canadianese is actually closer to “aboat” than “aboot”, eh

Privileged to be a Mozillian

Mike Conley noticed a bug. There was a regression on a particular Firefox Nightly build he was tracking down. It looked like this:

A time series plot with a noticeable regression at November 6

A pretty big difference… only there was a slight problem: there were no relevant changes between the two builds. Being the kind of developer he is, :mconley looked elsewhere and found a probe that only started being included in builds starting November 16.

The plot showed him data starting from November 15.

He brought it up on Roberto Vitillo was around and tried to reproduce, without success. For :mconley the regression was on November 5 and the data on the other probe started November 15. For :rvitillo the regression was on November 6 and the data started November 16. After ruling out addons, they assumed it was the dashboard’s fault and roped me into the discussion. This is what I had to say:

Hey, guess what's different between rvitillo and mconley? About 5 hours.

You see, :mconley is in the Toronto (as in Canada) Mozilla office, and Roberto is in the London (as in England) Mozilla office. There was a bug in how dates were being calculated that made it so the data displayed differently depending on your timezone. If you were on or East of the Prime Meridian you got the right answer. West? Everything looks like it happens one day early.

I hammered out a quick fix, which means the dashboard is now correct… but in thinking back over this bug in a post-mortem-kind-of-way, I realized how beneficial working in a distributed team is.

Having team members in multiple timezones not only provided us with a quick test location for diagnosing and repairing the issue, it equipped us with the mindset to think of timezones as a problematic element in the first place. Working in a distributed fashion has conferred upon us a unique and advantageous set of tools, experiences, thought processes, and mechanisms that allow us to ship amazing software to hundreds of millions of users. You don’t get that from just any cube farm.



Data Science is Hard – Case Study: How Do We Normalize Firefox Crashes?

When we use Firefox Crashes to determine the quality of a Firefox release, we don’t just use a count of the number of crashes:aurora51a2crashes

We instead look at crashes divided by the number of thousands of hours Firefox was running normally:auroramcscrashes

I’ve explained this before as a way to account for how crash volumes can change depending on Firefox usage on that particular day, even though Firefox quality likely hasn’t changed.

But “thousands of usage hours” is one of many possible normalization denominators we could have chosen. To explain our choice, we’ll need to explore our options.

Fans of Are We Stable Yet? may be familiar with a crash rate normalized by “hundred of Active Daily Instances (ADI)”. This is a valid denominator as Firefox usage does tend to scale linearly with the number of unique Firefox instances running that day. It is also very timely, as ADI comes to us from a server that Firefox instances contact at the beginning of their browsing sessions each day.

From across the fence, I am told that Google Chrome uses a crash rate normalized by “millions of pageloads”. This is a valid denominator as “loading a page” is one of the primary actions users take with their browsers. It is not any more or less timely than “thousands of usage hours” but with Google properties being primary page load destinations, this value could potentially be estimated server-side while waiting for user data to trickle in.

Denominators that would probably work, but I haven’t heard of anyone using, include: number of times the user opens the browser, amount of times the user scrolls, memory use, … generally anything could be used that increases at the same rate that crashes do on a given browser version.

So why choose “thousands of usage hours”? ADI comes in faster, and pageloads are more closely related to actions users take in the browser.

Compared to ADI, thousands of usage hours has proven to be a more reasonable and stable measure. In crashes-per-100-ADI there are odd peaks and valleys that don’t reflect decreases or increases in build quality. And though crashes scale proportionally with the number of Firefox instances running, it scales more closely with how heavily those running instances are being used.

As for why we don’t use pageloads… Well, the first reason is that “thousands of usage hours” is something we already have kicking around. A proper count of pageloads is something we’re adding at present. It will take a while for users to start sending us these numbers, and a little development effort to get that number into the correct dataset for analysis. Then we will evaluate its suitability. It won’t be faster or slower than “thousands of usage hours” (since it will use the same reporting mechanism) but I have heard no compelling evidence that it will result in any more stable or reasonable of a measure. So I’ll do what I always try to do: let the data decide.

So, for the present, that leaves us with crashes per thousands of usage hours which, aside from latency issues we have yet to overcome, seems to be doing fairly well.