Data Science is Hard – Case Study: How Do We Normalize Firefox Crashes?

When we use Firefox Crashes to determine the quality of a Firefox release, we don’t just use a count of the number of crashes:aurora51a2crashes

We instead look at crashes divided by the number of thousands of hours Firefox was running normally:auroramcscrashes

I’ve explained this before as a way to account for how crash volumes can change depending on Firefox usage on that particular day, even though Firefox quality likely hasn’t changed.

But “thousands of usage hours” is one of many possible normalization denominators we could have chosen. To explain our choice, we’ll need to explore our options.

Fans of Are We Stable Yet? may be familiar with a crash rate normalized by “hundred of Active Daily Instances (ADI)”. This is a valid denominator as Firefox usage does tend to scale linearly with the number of unique Firefox instances running that day. It is also very timely, as ADI comes to us from a server that Firefox instances contact at the beginning of their browsing sessions each day.

From across the fence, I am told that Google Chrome uses a crash rate normalized by “millions of pageloads”. This is a valid denominator as “loading a page” is one of the primary actions users take with their browsers. It is not any more or less timely than “thousands of usage hours” but with Google properties being primary page load destinations, this value could potentially be estimated server-side while waiting for user data to trickle in.

Denominators that would probably work, but I haven’t heard of anyone using, include: number of times the user opens the browser, amount of times the user scrolls, memory use, … generally anything could be used that increases at the same rate that crashes do on a given browser version.

So why choose “thousands of usage hours”? ADI comes in faster, and pageloads are more closely related to actions users take in the browser.

Compared to ADI, thousands of usage hours has proven to be a more reasonable and stable measure. In crashes-per-100-ADI there are odd peaks and valleys that don’t reflect decreases or increases in build quality. And though crashes scale proportionally with the number of Firefox instances running, it scales more closely with how heavily those running instances are being used.

As for why we don’t use pageloads… Well, the first reason is that “thousands of usage hours” is something we already have kicking around. A proper count of pageloads is something we’re adding at present. It will take a while for users to start sending us these numbers, and a little development effort to get that number into the correct dataset for analysis. Then we will evaluate its suitability. It won’t be faster or slower than “thousands of usage hours” (since it will use the same reporting mechanism) but I have heard no compelling evidence that it will result in any more stable or reasonable of a measure. So I’ll do what I always try to do: let the data decide.

So, for the present, that leaves us with crashes per thousands of usage hours which, aside from latency issues we have yet to overcome, seems to be doing fairly well.

:chutten

Advertisements

All Aboard the Release Train!

I’m working on a dashboard (a word which here means a website that has plots on it) to show Firefox crashes per channel over time. The idea is to create a resource for Release Management to figure out if there’s something wrong with a given version of Firefox with enough lead time to then do something about it.

It’s entering its final form (awaiting further feedback from relman and others to polish it) and while poking around (let’s call it “testing”) I noticed a pattern:auroramcscrashes

Each one of those spikes happens on a “merge day” when the Aurora channel (the channel powering Firefox Developer Edition) updates to the latest code from the Nightly channel (the channel powering Firefox Nightly). From then on, only stabilizing changes are merged so that when the next merge day comes, we have a nice, stable build to promote from Aurora to the Beta channel (the channel powering Firefox Beta). Beta undergoes further stabilization on a slower schedule (a new build every week or two instead of daily) to become a candidate for the Release channel (which powers Firefox).

For what it’s worth, this is called the Train Model. It allows us to ship stable and secure code with the latest features every six-to-eight weeks to hundreds of millions of users. It’s pretty neat.

And what that picture up there shows is that it works. The improvement is especially noticeable on the Aurora branch where we go from crashing 9-10 times for every thousand hours of use to 3-4 times for every thousand hours.

Now, the number and diversity of users on the Aurora branch is limited, so when the code rides the trains to Beta, the crash rate goes up. Suddenly code that seemed stable across the Aurora userbase is exposed to previously-unseen usage patterns and configurations of hardware and software and addons. This is one of the reasons why our pre-release users are so precious to us: they provide us with the early warnings we need to stabilize our code for the wild diversity of our users as well as the wild diversity of the Web.

If you’d like to poke around at the dashboard yourself, it’s over here. Eventually it’ll get merged into telemetry.mozilla.org when it’s been reviewed and formalized.

If you have any brilliant ideas of how to improve it, or find any mistakes or bugs, please comment on the tracking bug where the discussion is currently ongoing.

If you’d like to help in an different way, consider installing Firefox Beta. It should look and act almost exactly like your current Firefox, but with some pre-release goodies that you get to use before anyone else (like how Firefox Beta 50 lets you use Reader Mode when you want to print a web page to skip all the unnecessary sidebars and ads). By doing this you are helping us ship the best Firefox we can.

:chutten

One-Year Moziversary

Holy crap, I totally forgot to note on the 19th that it had been a full year since I started working at Mozilla!

In that year I’ve done surprisingly little of what I was hired to do (coding in Gecko, writing webpages and dashboards for performance monitoring) and a whole lot more of stuff I’d never done before (data analysis and interpretation, data management, teaching, blogging, interviewing).

Which is pretty awesome, I have to say, even if I sorta-sleepwalked into some of these responsibilities.

Highlights include hosting a talk at MozLondon (Death-Defying Stats), running… three? iterations of Telemetry Onboarding (including a complete rewrite), countless phone screens for positions within and outside of the team, being a team lead for just long enough to help my two reports leave the team (one to another team, the other to another company :S), and becoming (somehow) a voice for Windows XP Firefox users (expect another blog post in that series, real soon).

For my New MozYear Resolutions, I resolve to blog more (I’ve certainly slacked on the blogging in the latter half of the year), think more, and mentor more. We’ll see how that goes.

Anyway, here’s to another year!

:chutten

On Repeating Oneself

This is in response to ppk’s blog post DRY: Do Repeat Yourself. If you’re not interested in Web Development or Computer Science, or just don’t want to read that post, you may stop reading now and reward yourself with a near-infinite number of kittens.

A significant middle part of my career was a sort of Web Development coming from a Computer Science background. Working on the BlackBerry 10 Browser (the first mobile browser that was, itself, written as a web page) we struggled to apply, from day 0, proper engineering practices to what ultimately was web development.

And we succeeded. Mostly. Well, it was alright.

So maybe PPK is wrong! Maybe webdev can be CS already and it’s all a tempest in a teapot!

Well, no. The BB10 Browser had one benefit over other web development… one so great and so insidious I didn’t even notice it at the time (iow, privilege): We only had to build the page to work on one browser.

Sure, the UI mostly worked if you loaded it in Chrome or Firefox. That’s how we ran our tests, after all. But at the end of the day, it only needed to work on the BB10 version of WebKit. It only needed to be performant on the BB10 version of WebKit running on just a handful of chipsets. It only needed to look pretty on the BB10 version of WebKit on a handful of phones in at most two orientations each.

We could actually exhaustively test all of our supported configurations. In a day.

And this is where true webdevs would scoff. Because uncertainty is the name of the game on the Web. And I think that’s the core of why CS practices can’t necessarily apply to Web Development.

When I was coding the BB10 Browser, as when I do other more-CS-y stuff, I could know how things would behave. I knew what would and would not work. (and I could cheat by fixing the underlying browser if I found a bug, instead of working around it). I knew what parts of the design were slow and were to be avoided, and what cheap shortcuts to employ in what situations (when to use layers, when to use opacity, when to specifically avoid asking the compositor to handle it because the shaders were quick enough (thanks to mlattanzio, kpiascik, tabbott, and cwywiorski!)). I even knew what order things were going to happen, even when the phone was under load!

In short: I knew. Because I knew, I could operate with certainty. Because of that certainty, we wrote a browser in about a year.

But I know webdev is nothing like that. I’ve been on cross-browser projects since and tried to find that sense of certainty. I’ve tried to write features for screens of bizarre dimensions and groped for that knowledge that what I’d done would work wherever it needed. I’ve struggled to find a single performant and attractive solution that would work on all renderers and have felt the sting of my CS education yelling that I shouldn’t repeat myself.

I still scoff at the bailing-wire-and-ducttape websites that litter the web and bloat it with three outdated versions of each of a half-dozen popular frameworks. But, like the weather, I know the laws that motivate it to be so. Complaining is just my way of coping.

The question becomes, then, is there anything CS that can be applied to webdev? Certainly test-driven development and other meta-dev techniques work. The principles of UI design and avoiding premature optimization are universal.

But maybe we should be thinking of it the other way around. Maybe CS should accept some parts of webdev. Maybe CS educators should spend more time on uncertainty. Then CS students will start thinking about how much uncertainty they are willing to accept in their projects in the same way we already think of how performant or cross-platform or maintainable or verifiable the result needs to be.

I hope a few CS graduate students pick up PPK’s call for “Web Development from a Computer Science Perspective” theses. There’s a lot of good material here that can help shape our industries’ shared future.

Or not. I can’t be certain.

:chutten

Data Science is Hard – Part 1: Data

You’d think that categorizing and measuring populations would be pretty simple. You count them all up, divide them into groups… simple, arithmetic-sounding stuff.

To a certain extent, that’s all it is. You want to know how many people contribute to Firefox in a day? Add ’em up. Want to know what fraction of them are from Europe? Compare the subset from Europe against the entire population.

But that’s where it gets squishy:

  • “in a day?” Which day? Did you choose a weekend day? A statutory holiday? A religious holiday? That’ll change the data. Which 24 hours are you counting? From midnight-to-midnight, sure, but which timezone?
  • “from Europe?” What is Europe? Just the EU? How do you tell if a contributor is from Europe? Are you running a geolocation query against their IP? What if their IP changes over the day, are we going to double-count that user? Are we asking contributors where they are from? What if they lie?

So that leads us to Part 1 of “Data Science is Hard”: Data is Hard.

In a recent 1-on-1, my manager :bsmedberg and I thought that it could be interesting to look into Firefox users whose Telemetry reports come from different parts of the world at different times. Maybe we could identify users who travel (Firefox Users Who Travel: Where do they travel to/from?). Maybe they can help us understand the differing needs of Firefox users who are on vacation as opposed to being at home. Maybe they’ll show us Tor Browser users, or users using other anonymizing techniques and technologies: and maybe we should see if there’s some special handling we could provide for them and their data.

I used this topic as a way to learn how to use our new re:dash dashboard onto the prestodb instance of the Longitudinal Dataset. (which lets me run SQL queries against a 1% random sample of Firefox users’ Telemetry data from the past 180 days)

Immediately I ran into problems. First, with remembering all the SQL I had forgotten in the *mumblesomething* years since I last had to write interesting queries.

But then I quickly ran into problems with the data. I ran a query to boil down how many (and which) unique countries each client had reported Telemetry from:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , array_distinct(geo_country) AS countries
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 5
Country_count Countries
35 [“CN”,”MX”,”GB”,”HU”,”JP”,”US”,”RU”,”IN”,”HK”,”??”,”CA”,”KR”,”TW”,”CM”,”DK”,”CH”,”ZA”,”PH”,”DE”,”VN”,”NL”,”CO”,”KZ”,”MA”,”TR”,”FR”,”AU”,”GR”,”IE”,”AR”,”BY”,”AT”,”TN”,”BR”,”AM”]
34 [“DE”,”RU”,”LT”,”UA”,”MA”,”GB”,”GI”,”AE”,”FR”,”CN”,”AM”,”NG”,”NL”,”PT”,”TH”,”PL”,”ES”,”NO”,”CH”,”IL”,”ZA”,”BY”,”US”,”UZ”,”HK”,”TW”,”JP”,”PK”,”LU”,”SG”,”FI”,”EU”,”IN”,”ID”]
34 [“US”,”BR”,”KR”,”NZ”,”RO”,”JP”,”ES”,”GB”,”TW”,”CN”,”UA”,”AU”,”NL”,”FR”,”FI”,”??”,”NO”,”CA”,”ZA”,”CL”,”IT”,”SE”,”SG”,”CH”,”RU”,”DE”,”MY”,”IN”,”ID”,”VN”,”PL”,”PH”,”KE”,”EG”]
34 [“GB”,”CN”,”??”,”DE”,”US”,”RU”,”AL”,”ES”,”NL”,”FR”,”KR”,”FI”,”IR”,”CA”,”JP”,”HK”,”AU”,”CH”,”RO”,”CO”,”IE”,”BR”,”SE”,”GR”,”IN”,”MX”,”RS”,”AR”,”TW”,”IT”,”SA”,”ID”,”VN”,”TN”]
34 [“US”,”GI”,”??”,”GB”,”DE”,”SA”,”KR”,”AR”,”ZA”,”CN”,”IN”,”AT”,”CA”,”KE”,”IQ”,”VN”,”TR”,”KZ”,”JP”,”BR”,”FR”,”TW”,”IT”,”ID”,”SG”,”RU”,”CL”,”BA”,”NL”,”AU”,”BE”,”LT”,”PT”,”ES”]

35 unique countries visited? Wow.

The “Countries” column is in order of when they first appeared in the data, so we know that the first user was reporting from China then Mexico then Great Britain then Hungary then Japan then the US then Russia…

Either this is a globetrotting super spy, or we’re looking at some sort of VPN/Tor/anonymizing framework at play here.

( Either way I think it best to say, “Thank you for using Firefox, Ms. Super Spy!” )

Or maybe this is a sign that the geolocation service is unreliable, or that the data intake services are buggy, or something else that would be less than awesome.

Regardless: this data is hugely messy. But, 35 countries over 180 days? That’s just about doable in real life… except that it wasn’t over 180 days, but 2:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , cardinality(geo_country) AS subsession_count
    , cardinality(geo_country) / (date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date))) + 1) AS subsessions_per_day
    , date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date)) + 1) AS duration
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 1
Country_count Subsession_count Subsessions_per_day Duration
35 169 84 2

This client reported from 35 countries over 2 days. At least 17 countries per day (we’re skipping duplicates).

Also of note to Telemetry devs, this client was reporting 84 subsessions per day.

(Subsessions happen at a user’s local midnight and whenever some aspect of the Environment block of Telemetry changes (your locale, your multiprocess setting, how many addons you have installed). If your Firefox is registering that many subsession edges per day, there might be something wrong with your install. Or there might be something wrong with our data intake or aggregation.)

I still plan on poking around this idea of Firefox Users Who Travel. As I do so I need to remember that the data we collect is only useful for looking at Populations. Knowing that there’s one user visiting 35 countries in 2 days doesn’t help us decide whether or not we should release a special Globetrotter Edition of Firefox… since that’s just 1 of 4 million clients of a dataset representing only 1% of Firefox users.

Knowing that about a dozen users reported days with over 250 subsessions might result in some evaluation of that code, but without something linking these high-subsession-rate users together into a Population (maybe they’re machines running automated testing?), there’s nothing much we can do about it.

Instead I should focus on how, in a 4M user dataset, 112k (2.7%) users report from exactly 2 countries over the duration of the dataset. There are only 44k that report from more than 2, and the other 3.9M or so report exactly 1.

2.7% is a sliver of 1% of the Firefox population, but it is a Population. A Population is something we can analyse and speak meaningfully about, as the noise and mess of individual points of data has been smoothed out by the sheer weight of the Firefox user base.

It’s nice having a user base large enough to speak meaningfully about.

:chutten

Mozilla, Firefox, and Windows XP Support

windowsXPStartButton
Used with permission from Microsoft.

Last time I focused on what the Firefox Windows XP user population appeared to be. This time, I’m going to look into what such a large population means to Firefox and Mozilla.

Windows XP users of Firefox are geographically and linguistically diverse, and make up more than one tenth of the entire Firefox user population. Which is great, right? A large, diverse population of users… other open source projects only wish they had the luck.

Except Windows XP is past its end-of-life. Nearly two years past. This means it hasn’t been updated, maintained, serviced, or secured in longer than it takes Mars to make it around the Sun.

The Internet can be a scary place. There are people who want to steal your banking passwords, post your private pictures online, or otherwise crack open your computer and take and do what they want with what’s inside of it.

Generally, this is an arms race. Every time someone discovers a software vulnerability, software vendors rush to fix it before the Bad Guys can use it to exploit people’s computers.

The reason we feel safe enough to continue our modern life using computers for our banking, shopping, and communicating is because software vendors are typically better at this than the Bad Guys.

But what if you’re using Windows XP? Microsoft, the only software vendor who is permitted to fix vulnerabilities in Windows XP, has stopped fixing them.

This means each Windows XP vulnerability that is found remains exploitable. Forever.

These are just a few vulnerabilities that we know about.

And Windows XP isn’t just bad for Windows XP users.

There are a variety of crimes that can be committed only using large networks of “robot” machines (called “botnets“) under the control of a single Bad Guy. Machines can be recruited into botnets against their users’ will through security vulnerabilities in the software they are running. Windows XP’s popularity and lengthening list of known vulnerabilities might make it an excellent source of recruits.

With enough members, a botnet can then send spam emails in sufficient volume to overload mail servers, attack financial institutions, steal information from governmental agencies, and otherwise make the Internet a less nice place to be.

So Firefox has a large, diverse population of users whose very presence connected to the Internet is damaging the Web for us all.

And so does Google! At least for now. Google has announced that it will end Windows XP support for its Chrome browser in April 2016. (It previously announced end-of-life dates for April 2015, and then December 2015.)

So, as of April, Windows XP users will have only one choice for updated, serviced, maintained, and secured web browsing: Firefox.

Which puts Mozilla in a bit of a bind. The Internet is a global, public resource that Mozilla is committed to defend and improve.

Does improving the Internet mean dropping support for Windows XP so that users have no choice but to upgrade to be able to browse safely?

Or does improving the Internet mean continuing to support Windows XP so that those can at least still have a safe browser to access the Web?

Windows XP users might not have a choice in what Operating System their computers run. They might only be using it because they don’t know of an alternative or because they can’t afford to, aren’t allowed to, or are afraid to change.

Firefox is their best hope for security on the Web. And, after April, their only hope.

As of this writing, Firefox supports versions of Windows from XP SP2 on upwards. And this is likely to continue: the latest public discussion about Windows XP support was from last December, reacting to the latest of Google’s Windows XP support blog posts.

I can reiterate confidently: Firefox will continue to support Windows XP.

For now.

Mozilla will have to find a way to reconcile this with its mission. And Firefox will have to help.

Maybe Mozillians from around the world can seek out Windows XP users and put them in contact with local operations that can donate hardware or software or even just their time to help these users connect to the Internet safely and securely.

Maybe Firefox will start testing nastygrams to pop up at our Windows XP user base when they start their next browsing session: “Did you know that your operating system is older than many Mozillians?”, “It appears as though you are accessing the Internet using a close relative of the abacus”, “We have determined that this is the email address of your closest Linux User Group who can help you secure your computer”

…and you. Yeah, you. Do you know someone who might be running Windows XP? Maybe it’s your brother, or your Mother, or your Babbi. If you see a computer with a button that says “Start” at the bottom-left corner of the screen: you can fix that. There are resources in your community that can help.

Talk to your librarian, talk to the High School computer teacher, talk to a Mozillian! We’re friendly!

Together, we can save the Web.

:chutten