Data Science is Hard: History, or It Seemed Like a Good Idea At the Time

I’m mentoring a Summer of Code project this summer about redesigning the “about:telemetry” interface that ships with each and every version of Firefox.

The minute the first student (:flyingrub) asked me “What is a parent payload and child payload?” I knew I was going to be asked a lot of questions.

To least-effectively answer these questions, I’ll blog the answers as narratives. And to start with this question, here’s how the history of a project makes it difficult to collect data from it.

In the Beginning — or, rather, in the middle of October 2015 when I was hired at Mozilla (so, at my Beginning) — there was single-process Firefox, and all was good. Users had many tabs, but one process. Users had many bookmarks, but one process. Users had many windows, but one process. All this and the web contents themselves were all sharing time within a single construct of electrons and bits and code and pixels: vying with each other for control of the filesystem, the addressable space of RAM, the network resources, and CPU scheduling.

Not satisfied with things being just “good”, we took a page from the book penned by Google Chrome and decided the time was ripe to split the browser into many processes so that a critical failure in one would not trouble the others. To begin with, because our code is venerable, we decided that we would try two processes. One of these twins would be in charge of the browser and the other would be in charge of the web contents.

This project was called Electrolysis after the mechanism by which one might split water into Hydrogen and Oxygen using electricity.

Suddenly the browser became responsive, even in the face of the worst JavaScript written by the least experienced dev at the most privileged startup in Silicon Valley. And out-of-memory errors decreased in frequency because the browser’s memory and the web contents’ memory were able to grow without interfering with each other.

Remember, our code is venerable. Remember, our code hearkens from its single-process past.

Our data-collection code was written in that single-process past. But we had two processes with input events that need to be timed to find problems. We had two processes with memory allocations that need to be examined for regressions.

So the data collection code was made aware that there could be two types of process: parent and child.

Alas, not just one child. There could be many child processes in a row if some webpage were naughty and brought down the child in its anger. So the data collection code was made aware there could be many batches of data from child processes, and one batch of data from parent processes.

The parent data was left looking like single-process data, out in the root of the data collection payload. Child processes’ data were placed in an array of childPayloads where each payload echoed the structure of the parent.

Then, not content with “good”, I had to come along in bug 1218576, a bug whose number I still have locked in my memory, for good or ill.

Firefox needs to have multiple child processes of different types, simultaneously. And many of some of those several types, also simultaneously. What was going to be a quick way to ensure that childPayloads was always of length 1 turned into a months-long exercise to put data exactly where we wanted it to be.

And so now we have childPayloads where the “weird” content child data that resists aggregation remains, and we also have payload.processes.<process type>.* where the cool and hip data lives: histograms, scalars, and keyed variants of both.

Already this approach is showing dividends as some proportions of Nightly users are getting a gpu process, and others are getting some number of content processes. The data files neatly into place with minimal intervention required.

But it means about:telemetry needs to know whether you want the parent’s “weird” data or the child’s. And which child was that, again?

And about:telemetry also needs to know whether you want the parent’s “cool” data, or the content child’s, or the gpu child’s.

So this means that within about:telemetry there are now five places where you can select what process you want. One for “weird” data, and one for each of the four kinds of “cool” data.

Sadly, that brings my storytelling to a close, having reached the present day. Hopefully after this Summer’s Code is done, this will have a happier, more-maintainable, and responsively-designed ending.

But until now, remember that “accident of history” is the answer to most questions. As such it behooves you to learn history well.

:chutten

SSE2 Support in Firefox Users

Let me tell you a story.

Intel invented the x86 assembly language back in the Dark Ages of the Late 1970s. It worked, and many CPUs implemented it, consolidating a fragmented landscape into a more cohesive and compatible whole. Unfortunately, x86 had limitations, so in time it would have to go.

Lo, the time came in the Middle Ages of the Mid 1980s when x86 had to be replaced with something that could handle 32-bit widths for numbers and addresses. And more registers. And yet more addressing modes.

But x86 was popular, so Intel didn’t replace it. Instead they extended it with something called IA-32. And it was popular as well, not least because it was backwards-compatible with basic x86: all of the previous x86 programs would work on x86 + IA-32.

By now, personal and business computing was well in the mainstream. This means Intel finally had some data on what, at the lowest level, programmers were wanting to run on their chips.

It turns out that most of the heaviest computations people wanted to do on computers were really simple to express: multiply this list of numbers by a number, add these two lists of numbers together… spreadsheet sorts of things. Finance sorts of things.

But also video games sorts of things. Windows 95 released with DirectX and unleashed a flood of computer gaming. To the list we can now add: move every point and pixel of this 3D model forward by one step, transform all of this geometry and these textures from this camera POV to that one, recolour this sprite’s pixels to be darker to account for shade.

The structure all of these (and a lot of other) tasks had in common was that they all wanted to do one thing (multiply, add, move, transform, recolour) over multiple pieces of data (one list of numbers, multiple lists of numbers, points and pixels, geometry and textures, sprite colours).

SIMD stands for Single Instruction Multiple Data and is how computer engineers describe these sorts of “do one action over and over again to every individual element in this list of data” operations.

So, for Intel’s new flagship “Pentium” processor they were releasing in 1997 they introduced a new extension: MMX (which doesn’t stand for anything. They apparently chose the letters because they looked cool). MMX lets you do some of those SIMD things directly at the lowest level of the computer with the limitation that you can’t also be performing high-precision math at the same time.

AMD was competing with Intel. Not happy with the limitations of the MMX extension, they developed their own x86 extension “3DNow!” which performed the same operations, but without the limitations and with higher precision.

Intel retaliated with SSE: Streaming SIMD Extensions. They shipped it on their Pentium III processors starting in ’99. It wasn’t a full replacement for MMX, though, so they had to quickly follow it up in the Pentium 4.

Which finally brings us to SSE2. First released in 2001 in the Pentium 4 line of processors (also implemented by AMD two years later in their Opteron line), it reimplemented MMX’s capabilities without its shortcomings (and added some other capabilities at the same time).

So why am I talking ancient history? 2001 was fifteen years ago. What use do we have for this lesson on SSE2 when even SSE4 has been around since ’07, and AVX-512 will ship on real silicon within months?

Well, it turns out that Firefox doesn’t assume you have SSE2 on your computer. It can run on fifteen-year-old hardware, if you have it.

There are some code paths that benefit strongly from the ability to run the SIMD instructions present in SSE2. If Mozilla can’t assume that everyone running Firefox has a computer capable of running SSE2, Firefox has to detect, at runtime, whether the user’s computer is capable of using that fast path.

This makes Firefox bigger, slower, and harder to test and maintain.

A question came up on the dev-platform mailing list about how many Firefox users are actually running computers that lack SSE2. I live in a very rich country and have a very privileged job. Any assumption I make about who does and does not have the ability to run computers that are newer than fifteen years old is going to be clouded by biases I cannot completely account for.

So we turn to the data. Which means Telemetry. Which means I get to practice writing custom analyses. (Yay!)

It turns out that, if a Firefox user has Telemetry enabled, we ask that user’s computer about a lot of environmental information. What is your operating system? What version? How much RAM do you have installed? What graphics card do you have? What version is its driver?

And, yes: What extensions does your CPU support?

We collect this information to determine from real users machines whether a particular environmental variable makes Firefox behave poorly. In the not-too-distant past there was a version of Firefox that would just appear black. No reason, no recourse, no explanation. By examining environmental data we were able to track down what combination of graphics cards and driver versions were susceptible to this and develop a fix within days.

(If you want to see an application of this data yourself, here is a dashboard showing the population breakdown of Firefox users. You can use it to see how much of the Firefox user base is like you. For me, less than 1% of the user base was running a computer like mine with a Firefox like mine, reinforcing that what I might think makes sense may not exactly be representative of reality for Firefox users.)

So I asked of the data: of all the users reporting Telemetry on Thursday January 21, 2016, how many have SSE2 capability on their CPUs?

And the answer was: about 99.5%

This would suggest that at most 0.5% of the Firefox release population are running CPUs that do not have SSE2. This is not strictly correct (there are a variety of data science reasons why we cannot prove anything about the population that doesn’t report SSE2 capability), but it’s a decent approximation so let’s go with it.

From there, as with most Telemetry queries, there were more questions. The first was: “Are the users not reporting SSE2 support keeping themselves on older versions of Firefox?” This is a good question because, if the users are keeping themselves on old versions, we can enable SSE2 support in new versions and not worry about the users being unable to upgrade because they already chose not to.

Turns out, no. They’re not.

With such a small population we’re subdividing (0.5%) it’s hard to say anything for certain, but it appears as though they are mostly running up-to-date versions of Firefox and, thus, would be impacted by any code changes we release. Ah, well.

The next questions were: “We know SSE2 is required to run Windows 8. Are these users stuck on Windows XP? Are there many Linux users without SSE2?”

Turns out: yes and no. Yes, they almost all are on Windows XP. No, basically none of them are running Linux.

Support and security updates for Windows XP stopped on April 8, 2014. It probably falls under Mozilla’s mission to try and convince users still running XP to upgrade themselves if possible (as they did on Data Privacy Day), to improve the security of the Internet and to improve those users’ access to the Web.

If you are running Windows XP, or administer a family member’s computer who is, you should probably consider upgrading your operating system as soon as you are able.

If you are running an older computer and want to know if you might not have SSE2, you can open a Firefox tab to about:telemetry and check the Environment section. Under system should be a field “cpu.extensions” that will contain the token “hasSSE2” if Firefox has detected that you have SSE2.

(If about:telemetry is mostly blank, try clicking on the ‘Change’ links at the top labelled “FHR data upload is disabled” and “Extended Telemetry recording is disabled” and then restarting your Firefox)

SSE2 will probably be coming soon as a system requirement for Firefox. I hope all of our users are ready for when that happens.

:chutten