Data Science is Hard – Case Study: How Do We Normalize Firefox Crashes?

When we use Firefox Crashes to determine the quality of a Firefox release, we don’t just use a count of the number of crashes:aurora51a2crashes

We instead look at crashes divided by the number of thousands of hours Firefox was running normally:auroramcscrashes

I’ve explained this before as a way to account for how crash volumes can change depending on Firefox usage on that particular day, even though Firefox quality likely hasn’t changed.

But “thousands of usage hours” is one of many possible normalization denominators we could have chosen. To explain our choice, we’ll need to explore our options.

Fans of Are We Stable Yet? may be familiar with a crash rate normalized by “hundred of Active Daily Instances (ADI)”. This is a valid denominator as Firefox usage does tend to scale linearly with the number of unique Firefox instances running that day. It is also very timely, as ADI comes to us from a server that Firefox instances contact at the beginning of their browsing sessions each day.

From across the fence, I am told that Google Chrome uses a crash rate normalized by “millions of pageloads”. This is a valid denominator as “loading a page” is one of the primary actions users take with their browsers. It is not any more or less timely than “thousands of usage hours” but with Google properties being primary page load destinations, this value could potentially be estimated server-side while waiting for user data to trickle in.

Denominators that would probably work, but I haven’t heard of anyone using, include: number of times the user opens the browser, amount of times the user scrolls, memory use, … generally anything could be used that increases at the same rate that crashes do on a given browser version.

So why choose “thousands of usage hours”? ADI comes in faster, and pageloads are more closely related to actions users take in the browser.

Compared to ADI, thousands of usage hours has proven to be a more reasonable and stable measure. In crashes-per-100-ADI there are odd peaks and valleys that don’t reflect decreases or increases in build quality. And though crashes scale proportionally with the number of Firefox instances running, it scales more closely with how heavily those running instances are being used.

As for why we don’t use pageloads… Well, the first reason is that “thousands of usage hours” is something we already have kicking around. A proper count of pageloads is something we’re adding at present. It will take a while for users to start sending us these numbers, and a little development effort to get that number into the correct dataset for analysis. Then we will evaluate its suitability. It won’t be faster or slower than “thousands of usage hours” (since it will use the same reporting mechanism) but I have heard no compelling evidence that it will result in any more stable or reasonable of a measure. So I’ll do what I always try to do: let the data decide.

So, for the present, that leaves us with crashes per thousands of usage hours which, aside from latency issues we have yet to overcome, seems to be doing fairly well.

:chutten

Advertisements

One thought on “Data Science is Hard – Case Study: How Do We Normalize Firefox Crashes?

  1. Pingback: Data Science is Hard: Client Delays – chuttenblog

Comments are closed.