Data Science is Hard – Case Study: What is a Firefox Crash?

In the past I’ve gone on at length about the challenge of getting timely data to determine Firefox release quality with respect to how often Firefox crashes. Comparatively I’ve spent essentially no time at all on what a crash actually is.

A crash (broadly) is what happens when a computer process encounters an error it cannot recover from. Since it cannot recover, the system it is running within ends the process abruptly.

Not all crashes are equal. Not all crashes mean the same thing to users and to release managers and to computer programmers.

If you are in the middle of drafting an email and the web page content suddenly goes blank and says “Sorry, this tab has crashed.” then that’s a big deal. It’s even worse if the entire browser disappears without warning.

But what if Firefox crashes, but only after it has mostly shut down? Everything’s been saved properly, but we didn’t clean up after ourselves well. This is a crash (technically), but does it really matter to a user?

What if the process that contains Flash crashes and web advertisements stop working? It can be restarted without too much trouble, and no one likes ads, so is it really that bad of a thing?

And on top of these families of events, there are other horrible things that can happen to users we might want to call “crashes” even though they aren’t. For instance: what if the browser becomes completely unresponsive and the user has no recourse but to close it? The process didn’t encounter a fatal error, but that user’s situation is the same: Something weird happened, and now their data is gone.

Generally speaking, I look at four classes of crash: Main Crashes (M), Content Crashes (C), Content Shutdown Crashes (S), and Plugin Crashes (P).

In my opinion, the most reliable indicator of Firefox’s stability and quality is M + C – S. In plain English, it is the sum of the events where the whole Browser goes poof or the Web Content inside the browser goes poof, ignoring the times when the Web Content goes poof after the user has decided to shut down the browser.

It doesn’t include Plugin crashes, as those are less obtrusive and more predicted by the plugin code, not Firefox code. It does include some events where Firefox became unresponsive (or “hangs” for short) and had to be terminated.

This, to my mind, most accurately encompasses a measure of Firefox quality. If the number of these crashes goes up, that means there are more times where more users are having less fun with Firefox. If the number of these crashes goes down, that means there are fewer times that fewer people are having less fun with Firefox.

It doesn’t tell the whole story. What good is a not-crashing browser if it doesn’t scroll when you ask it to? What good is a stable piece of web content if half of it is missing because we don’t support it? What good is a Firefox that is open all the time if it takes twice as long to load the web pages you care about?

But it gives us one very important part of the Firefox Quality story, and that’s good enough for me.


2 thoughts on “Data Science is Hard – Case Study: What is a Firefox Crash?

  1. Pingback: Data Science is Hard – Case Study: How Do We Normalize Firefox Crashes? – chuttenblog

  2. Pingback: Another Advantage of Decreasing Data Latency: Flatter Graphs – chuttenblog

Comments are closed.