Data Science is Hard: Client Delays

Delays suck, but unmeasured delays suck more. So let’s measure them.

I’ve previous talked about delays as they relate to crash pings. This time we’re looking at the core of Firefox Telemetry data collection: the “main” ping. We’ll be looking at a 10% sample of all “main” pings submitted on Tuesday, January 10th[1].

In my previous post on delays, I defined five types of delay: recording, submission, aggregation, migration, and query scheduling. This post is about delays on the client side of the equation, so we’ll be focusing on the first two: recording, and submission.

Recording Delay

How long does it take from something happening, to having a record of it happening? We count HTTP response codes (as one does), so how much time passes from that first HTTP response to the time when that response’s code is packaged into a ping to be sent to our servers?

output_20_1

This is a Cumulative Distribution Functions or CDF. The ones in this post show you what proportion (0% – 100%) of “main” pings we’re looking at arrived with data that falls within a certain timeframe (0 – 96 hours). So in this case, look at the red, “aurora”-branch line. It crosses the 0.9 y-axis line at about the 8 x-axis line. This means 90% of the pings had a recording delay of 8 hours or less.

Which is fantastic news, especially since every other channel (release and beta vying for fastest speeds) gets more of its pings in even faster. 90% of release pings have a recording delay of at most 4 hours.

And notice that shelf at 24 hours, where every curve basically jumps to 100%? If users leave their browsers open for longer than a day, we cut a fresh ping at midnight. Glad to see evidence that it’s working.

All in all it shows that we can expect recording delays of under 30min for most pings across all channels. This is not a huge source of delay.

Submission Delay

With all that data finally part of a “main” ping, how long before the servers are told? For now, Telemetry has to wait for the user to restart their Firefox before it is able to send its pings. How long can that take?

output_23_1

Ouch.

Now we see aurora is no longer the slowest, and has submission delays very similar to release’s submission delays.  The laggard is now beta… and I really can’t figure out why. If Beta users are leaving their browsers open longer, we’d expect to see them be on the slower side of the “Recording Delay CDF” plot. If Beta users are leaving their browser closed longer, we’d expect them to show up lower on Engagement Ratio plots (which they don’t).

A mystery.

Not a mystery is that nightly has the fastest submission times. It receives updates every day so users have an incentive to restart their browsers often.

Comparing Submission Delay to Recording Delay, you can see how this is where we’re focusing most of our “Get More Data, Faster” attentions. If we wait for 90% of “main” pings to arrive, then we have to wait at least 17 hours for nightly data, 28 hours for release and aurora… and over 96 hours for beta.

And that’s just Submission Delay. What if we measured the full client -> server delay for data?

Combined Client Delay

output_27_1

With things the way they were on 2017-01-10, to get 90% of “main” pings we need to wait a minimum of 22 hours (nightly) and a maximum of… you know what, I don’t even know. I can’t tell where beta might cross the 0.9 line, but it certainly isn’t within 96 hours.

If we limit ourselves to 80% we’re back to a much more palatable 11 hours (nightly) to 27 hours (beta). But that’s still pretty horrendous.

I’m afraid things are actually even worse than I’m making it out to be. We rely on getting counts out of “main” pings. To count something, you need to count every single individual something. This means we need 100% of these pings, or as near as we can get. Even nightly pings take longer than 96 hours to get us more than 95% of the way there.

What do we use “main” pings to count? Amongst other things, “usage hours” or “how long has Firefox been open”. This is imperative to normalizing crash information properly so we can determine the health and stability of a release.

As you can imagine, we’re interested in knowing this as fast as possible. And as things stood a couple of Tuesdays ago, we have a lot of room for improvement.

For now, expect more analyses like this one (and more blog posts like this one) examining how slowly or quickly we can possibly get our data from the users who generate it to the Mozillians who use it to improve Firefox.

:chutten

[1]: Why did I look at pings from 2017-01-10? It was a recent Tuesday (less weekend effect) well after Gregorian New Year’s Day, well before Chinese New Year’s Day, and even a decent distance from Epiphany. Also the 01-10 is a mirror which I thought was neat.

Firefox’s Windows XP Users’ Upgrade Path

We’re still trying to figure out what to do with Firefox users on Windows XP.

One option I’ve heard is: Can we just send a Mozillian to each of these users’ houses with a fresh laptop and training in how to migrate apps and data?

( No, we can’t. For one, we can’t uniquely identify who and where these users are (this is by design). For two, even if we could, the Firefox Windows XP userbase is too geographically diverse (as I explained in earlier posts) for “meatspace” activities like these to be effective or efficient. For three, this could be kinda expensive… though, so is supporting extra Operating Systems in our products. )

We don’t have the advertising spend to reach all of these users in the real world, but we do have access to their computers in their houses… so maybe we can inform them that way?

Well, we know we can inform people through their browsers. We have plenty of data from our fundraising drives to that effect… but what do we say?

Can we tell them that their computer is unsafe? Would they believe us if we did?

Can we tell them that their Firefox will stop updating? Will they understand what we mean if we did?

Do these users have the basic level of technical literacy necessary to understand what we have to tell them? And if we somehow manage to get the message across about what is wrong and why,  what actions can we recommend they take to fix this?

This last part is the first thing I’m thinking about, as it’s the most engineer-like question: what is the optimal upgrade strategy for these users? Much more concrete to me than trying to figure out wording, appearance, and legality across dozens of languages and cultures.

Well, we could instruct them to upgrade to Linux. Except that it wouldn’t be an upgrade, it’d be a clean wipe and reinstall from scratch: all the applications would be gone and all of their settings would reset to default. All the data on their machines would be gone unless they could save it somewhere else, and if you imagine a user who is running Windows XP, you can easily imagine that they might not have access to a “somewhere else”. Also, given the average level of technical expertise, I don’t think we can make a Linux migration simple enough for most of these users to understand. These users have already bought into Windows, so switching them away is adding complexity no matter how simplistic we could make it for these users once the switch was over.

We could instruct them to upgrade to Windows 7. There is a clear upgrade path from XP to 7 and the system requirements of the two OSes are actually very similar. (Which is, in a sincere hat-tip to Microsoft, an amazing feat of engineering and commitment to users with lower-powered computers) Once there, if the user is eligible for the Windows 10 upgrade, they can take that upgrade if they desire (the system requirements for Windows 10 are only _slightly_ higher than Windows 7 (10 needs some CPU extensions that 7 doesn’t), which is another amazing feat). And from there, the users are in Microsoft’s upgrade path, and out of the clutches of the easiest of exploits, forever. There are a lot of benefits to using Windows 7 as an upgrade path.

There are a few problems with this:

  1. Finding copies of Windows 7: Microsoft stopped selling copies of Windows 7 years ago, and these days the most reliable way to find a copy is to buy a computer with it already installed. Mozilla likely isn’t above buying computers for everyone who wants them (if it has or can find the money to do so), but software is much easier to deliver than hardware, and is something we already know how to do.
  2. Paying for copies of Windows 7: Are we really going to encourage our users to spend money they may not have on upgrading a machine that still mostly-works? Or is Mozilla going to spend hard-earned dollarbucks purchasing licenses of out-of-date software for everyone who didn’t or couldn’t upgrade?
  3. Windows 7 has passed its mainstream support lifetime (extended support’s still good until 2020). Aren’t we just replacing one problem with another?
  4. Windows 7 System Requirements: Windows XP only needed a 233MHz processor, 64MB of RAM, and 1.5GB of HDD. Windows 7 needs 1GHz, 1GB, and 16GB.

All of these points are problematic, but that last point is at least one I can get some hard numbers for.

We don’t bother asking users how big their disk drives are, so I can’t detect how many users are cannot meet Windows 7’s HDD requirements. However, we do measure users’ CPU speeds and RAM sizes (as these are important for sectioning performance-related metrics. If we want to see if a particular perf improvement is even better on lower-spec hardware, we need to be able to divvy users up by their computers’ specifications).

So, at first this seems like a breeze: the question is simply stated and is about two variables that we measure. “How many Windows XP Firefox users are Stuck because they have CPUs slower than 1GHZ or RAM smaller than 1GB?”

But if you thought that for more than a moment, you should probably go back and read my posts about how Data Science is hard. It turns out that getting the CPU speed on Windows involves asking the registry for data, which can fail. So we have a certain amount of uncertainty.

windowsXPStuck

So, after crunching the data and making some simplifying assumptions (like how I don’t expect the amount of RAM or the speed of a user’s CPU to ever decrease over time) we have the following:

Between 40% and 53% of Firefox users running Windows XP are Stuck (which is to say, they can’t be upgraded past Windows XP because they fail at least one of the requirements).

That’s some millions of users who are Stuck no matter what we do about education, advocacy, and software.

Maybe we should revisit the “Mozillians with free laptops” idea, after all?

:chutten

 

Firefox User Engagement

I now know, which is to say that I can state with some degree of certainty, that Windows XP Firefox users are only a little less engaged with their Firefox installations as the Firefox user base as a whole.

To get here I needed to sanitize dates with more than four digits in their years (paging the Long Now Foundation: someone’s reporting Telemetry from the year 29634), with timezones not of this Earth (Wikipedia seems to think UTC+14:00 is the highest time zone, but I have seen data from +14:30), and with clones reporting the same session over and over again from different locations (this one might be because a user’s client_id is stored in their profile. If that profile is reused, then we will get data reported from all the locations that use that profile).

I also needed a rigorous definition of what it means for a user population to be “engaged”.

We chose to define an Engagement Ratio for a given day which is basically the number of people who ran Firefox that day divided by the number of people who ran Firefox the previous month. In other words: what proportion of users who could possibly be active actually were active on that day?

Of course, if you read my previous post, you know that the day of the week you choose is going to change your result dramatically. So instead of counting each user who used Firefox that exact day, we average it out over the previous week: if a user was active each day, they’re a full Daily Active User (DAU). If they’re active only one day, they’re 1/7 of a Daily Active User.

To see this in action, I chose to measure Engagement on March the 10th of this year, which was a Thursday. The number of users who reported data to the 1% Longitudinal Dataset who were active that day was 1,119,335. If we use instead the average of daily active users for the week ending on March the 10, we get 1,051,011.86 which is lower by over 6%. This is consistent with data we’ve collected in other studies that showed only 20% of Firefox users use it 7 days a week. Another 25% use it only on weekdays. So it makes sense that a weekday’s DAU count would be higher than a weekend day’s.

If you’ve ever complained about having to remember how many days there are in a month, you know that the choice of “month” is going to change things as well. So in this case, we just chose 28 days: four weeks. That there is no month that is always 28 days long (lookin’ at you, Leap Years) is irrelevant because we’re selecting values to make things easier for ourselves. So if a user was active on any of the previous 28 days, they are a Monthly Active User (MAU).

So you count your DAU and you divide it by your MAU count and you get your Engagement Ratio (ER) whose units are… unitless. Divide users by users and you get… something that’s almost a percent, in that it’s a value from 0 to 1 that represents a proportion of  a population.

This number we can track over time. We expect it to trend downwards when we ship something that negatively impacts users. We expect it to trend upwards when we ship something that users like.

So long as we can get this data quickly enough and reliably enough, we can start determining from the numbers (the silent majority), not noisy users (the vocal minority), what issues actually matter to the user base at large.

:chutten

 

Data Science is Hard – Part 1: Data

You’d think that categorizing and measuring populations would be pretty simple. You count them all up, divide them into groups… simple, arithmetic-sounding stuff.

To a certain extent, that’s all it is. You want to know how many people contribute to Firefox in a day? Add ’em up. Want to know what fraction of them are from Europe? Compare the subset from Europe against the entire population.

But that’s where it gets squishy:

  • “in a day?” Which day? Did you choose a weekend day? A statutory holiday? A religious holiday? That’ll change the data. Which 24 hours are you counting? From midnight-to-midnight, sure, but which timezone?
  • “from Europe?” What is Europe? Just the EU? How do you tell if a contributor is from Europe? Are you running a geolocation query against their IP? What if their IP changes over the day, are we going to double-count that user? Are we asking contributors where they are from? What if they lie?

So that leads us to Part 1 of “Data Science is Hard”: Data is Hard.

In a recent 1-on-1, my manager :bsmedberg and I thought that it could be interesting to look into Firefox users whose Telemetry reports come from different parts of the world at different times. Maybe we could identify users who travel (Firefox Users Who Travel: Where do they travel to/from?). Maybe they can help us understand the differing needs of Firefox users who are on vacation as opposed to being at home. Maybe they’ll show us Tor Browser users, or users using other anonymizing techniques and technologies: and maybe we should see if there’s some special handling we could provide for them and their data.

I used this topic as a way to learn how to use our new re:dash dashboard onto the prestodb instance of the Longitudinal Dataset. (which lets me run SQL queries against a 1% random sample of Firefox users’ Telemetry data from the past 180 days)

Immediately I ran into problems. First, with remembering all the SQL I had forgotten in the *mumblesomething* years since I last had to write interesting queries.

But then I quickly ran into problems with the data. I ran a query to boil down how many (and which) unique countries each client had reported Telemetry from:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , array_distinct(geo_country) AS countries
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 5
Country_count Countries
35 [“CN”,”MX”,”GB”,”HU”,”JP”,”US”,”RU”,”IN”,”HK”,”??”,”CA”,”KR”,”TW”,”CM”,”DK”,”CH”,”ZA”,”PH”,”DE”,”VN”,”NL”,”CO”,”KZ”,”MA”,”TR”,”FR”,”AU”,”GR”,”IE”,”AR”,”BY”,”AT”,”TN”,”BR”,”AM”]
34 [“DE”,”RU”,”LT”,”UA”,”MA”,”GB”,”GI”,”AE”,”FR”,”CN”,”AM”,”NG”,”NL”,”PT”,”TH”,”PL”,”ES”,”NO”,”CH”,”IL”,”ZA”,”BY”,”US”,”UZ”,”HK”,”TW”,”JP”,”PK”,”LU”,”SG”,”FI”,”EU”,”IN”,”ID”]
34 [“US”,”BR”,”KR”,”NZ”,”RO”,”JP”,”ES”,”GB”,”TW”,”CN”,”UA”,”AU”,”NL”,”FR”,”FI”,”??”,”NO”,”CA”,”ZA”,”CL”,”IT”,”SE”,”SG”,”CH”,”RU”,”DE”,”MY”,”IN”,”ID”,”VN”,”PL”,”PH”,”KE”,”EG”]
34 [“GB”,”CN”,”??”,”DE”,”US”,”RU”,”AL”,”ES”,”NL”,”FR”,”KR”,”FI”,”IR”,”CA”,”JP”,”HK”,”AU”,”CH”,”RO”,”CO”,”IE”,”BR”,”SE”,”GR”,”IN”,”MX”,”RS”,”AR”,”TW”,”IT”,”SA”,”ID”,”VN”,”TN”]
34 [“US”,”GI”,”??”,”GB”,”DE”,”SA”,”KR”,”AR”,”ZA”,”CN”,”IN”,”AT”,”CA”,”KE”,”IQ”,”VN”,”TR”,”KZ”,”JP”,”BR”,”FR”,”TW”,”IT”,”ID”,”SG”,”RU”,”CL”,”BA”,”NL”,”AU”,”BE”,”LT”,”PT”,”ES”]

35 unique countries visited? Wow.

The “Countries” column is in order of when they first appeared in the data, so we know that the first user was reporting from China then Mexico then Great Britain then Hungary then Japan then the US then Russia…

Either this is a globetrotting super spy, or we’re looking at some sort of VPN/Tor/anonymizing framework at play here.

( Either way I think it best to say, “Thank you for using Firefox, Ms. Super Spy!” )

Or maybe this is a sign that the geolocation service is unreliable, or that the data intake services are buggy, or something else that would be less than awesome.

Regardless: this data is hugely messy. But, 35 countries over 180 days? That’s just about doable in real life… except that it wasn’t over 180 days, but 2:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , cardinality(geo_country) AS subsession_count
    , cardinality(geo_country) / (date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date))) + 1) AS subsessions_per_day
    , date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date)) + 1) AS duration
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 1
Country_count Subsession_count Subsessions_per_day Duration
35 169 84 2

This client reported from 35 countries over 2 days. At least 17 countries per day (we’re skipping duplicates).

Also of note to Telemetry devs, this client was reporting 84 subsessions per day.

(Subsessions happen at a user’s local midnight and whenever some aspect of the Environment block of Telemetry changes (your locale, your multiprocess setting, how many addons you have installed). If your Firefox is registering that many subsession edges per day, there might be something wrong with your install. Or there might be something wrong with our data intake or aggregation.)

I still plan on poking around this idea of Firefox Users Who Travel. As I do so I need to remember that the data we collect is only useful for looking at Populations. Knowing that there’s one user visiting 35 countries in 2 days doesn’t help us decide whether or not we should release a special Globetrotter Edition of Firefox… since that’s just 1 of 4 million clients of a dataset representing only 1% of Firefox users.

Knowing that about a dozen users reported days with over 250 subsessions might result in some evaluation of that code, but without something linking these high-subsession-rate users together into a Population (maybe they’re machines running automated testing?), there’s nothing much we can do about it.

Instead I should focus on how, in a 4M user dataset, 112k (2.7%) users report from exactly 2 countries over the duration of the dataset. There are only 44k that report from more than 2, and the other 3.9M or so report exactly 1.

2.7% is a sliver of 1% of the Firefox population, but it is a Population. A Population is something we can analyse and speak meaningfully about, as the noise and mess of individual points of data has been smoothed out by the sheer weight of the Firefox user base.

It’s nice having a user base large enough to speak meaningfully about.

:chutten

Windows XP Firefox Users

Speaking of fourteen-year-old technology: Windows XP.

88195-lojmr

Ah, Windows XP. The most-used Operating System until August, 2012. Released in October of 2001, it was an incredibly-popular upgrade to the previous champion, Windows 98 Second Edition.

It was so popular that Microsoft kept sending it security updates and assorted bugfixes up to April 8, 2014. That’s nearly twelve and a half years. That’s four and a half years after XPs true successor, Windows 7, was released. (We do not speak of Vista)

Windows 7 will end-of-life after only 10 years, in 2020. Windows 8 lost it after 3 (if you include 8.1, it will only last 9).

Why am I talking about it, almost two full years after even Microsoft stopped supporting Windows XP: the most long-supported of OSes from Redmond? Because people are still using it. And not just any people: Firefox users.

Roughly 12% of Firefox users on release and beta channels are running Windows XP. That’s almost as many as are using Windows 10 (15%) and almost double how many users we have on Mac (6.7%).

(For the record, Linux users on these two channels make up less than 1%)

[edit – oh, and please remember that the usual rules of Data Science apply: I’m only able to analyse what is being provided. So if a disproportionate number of Firefox users on, say, Linux aren’t reporting Telemetry on release or beta channels, they will be undercounted in the presented numbers]

“But… Who? I don’t know anyone who is using an operating system two years past its end-of-life.” you might be thinking. Well, this is called an “inherent bias”: just because it doesn’t happen to you doesn’t mean it isn’t happening.

Intrigued by this result (as I too know no one still running XP) I tried to find out what was different about the XP population as compared to the wider Firefox user base, and what was the same. I used a longitudinal dataset that we’re experimenting with internally, which is why all of my figures aren’t linked to analyses.

It might interest you to know that Windows XP users are more likely to have configured their Firefox to run in the Russian, Polish, or Spanish locales. They are less likely to have configured it to use the English or German locales.

Windows XP users are less centrally-located than other users. According to geo-ip lookups of users’ IP addresses when they submit Firefox Telemetry reports, nearly 18% of the Firefox user base is in the United States. That is a great degree of centralization and means Mozilla can get great “bang for its buck” with outreach programs that operate in the United States. A Windows XP Firefox user is far less likely to be in the United States or Germany, and is slightly less likely to be in Great Britain, France, or Japan. Instead, a Windows XP Firefox user is somewhat more likely to be in Russia, Poland, India, the Ukraine, Egypt, Spain, or Italy.

How engaged are Windows XP Firefox users? Maybe they only have it installed and running, but don’t actually use it much day-to-day.

A good predictor of engagement is having Firefox set as the default browser on a system, so a lower proportion of “default browser: Yes” for Windows XP users might signal lower engagement. However, the data shows that Windows XP Firefox users are more likely to have Firefox set as their system’s default browser (even accounting for how difficult it now is in Windows 10 to set non-Edge browsers to be the system default).

Another good predictor of engagement is having addons installed. Addons might also be a good signifier that a user chose Firefox because it offers, with its addons, a compelling feature that not just any other browser can match. Data says: Windows XP Firefox users are much more likely to have 0 addons installed than the Firefox users in general, so maybe the choice of browser was made for them? Maybe they don’t know about addons at all.

All these differences might make it easy to divide the user base into “us” and “them”. “They” are running an outdated OS. “They” make hardware acceleration development harder in Firefox.

But we are all similar in more respects than you realize. We are all just as likely to live in Canada, Mexico, Indonesia, Belgium, Portugal, South Africa, or any of the other dozens and dozens of countries where Firefox users live. We are all just as likely to be running an up-to-date version of release Firefox. We are all as likely to have a truly-ridiculous number of addons installed (13 or more).

And, most importantly, we all use Firefox to access the global resource that is the Web.

:chutten

Edit – I don’t have data to back up my assertion that “A good predictor of engagement is having Firefox set as the default browser on a system” and :dolske told me about an experiment Firefox ran where we just didn’t ask users to set us as their default browser. In the experiment, user retention and active hours did not decrease. Desktop users’ engagement is apparently unaffected by what browser is opened if you click on a link in another program.

SSE2 Support in Firefox Users

Let me tell you a story.

Intel invented the x86 assembly language back in the Dark Ages of the Late 1970s. It worked, and many CPUs implemented it, consolidating a fragmented landscape into a more cohesive and compatible whole. Unfortunately, x86 had limitations, so in time it would have to go.

Lo, the time came in the Middle Ages of the Mid 1980s when x86 had to be replaced with something that could handle 32-bit widths for numbers and addresses. And more registers. And yet more addressing modes.

But x86 was popular, so Intel didn’t replace it. Instead they extended it with something called IA-32. And it was popular as well, not least because it was backwards-compatible with basic x86: all of the previous x86 programs would work on x86 + IA-32.

By now, personal and business computing was well in the mainstream. This means Intel finally had some data on what, at the lowest level, programmers were wanting to run on their chips.

It turns out that most of the heaviest computations people wanted to do on computers were really simple to express: multiply this list of numbers by a number, add these two lists of numbers together… spreadsheet sorts of things. Finance sorts of things.

But also video games sorts of things. Windows 95 released with DirectX and unleashed a flood of computer gaming. To the list we can now add: move every point and pixel of this 3D model forward by one step, transform all of this geometry and these textures from this camera POV to that one, recolour this sprite’s pixels to be darker to account for shade.

The structure all of these (and a lot of other) tasks had in common was that they all wanted to do one thing (multiply, add, move, transform, recolour) over multiple pieces of data (one list of numbers, multiple lists of numbers, points and pixels, geometry and textures, sprite colours).

SIMD stands for Single Instruction Multiple Data and is how computer engineers describe these sorts of “do one action over and over again to every individual element in this list of data” operations.

So, for Intel’s new flagship “Pentium” processor they were releasing in 1997 they introduced a new extension: MMX (which doesn’t stand for anything. They apparently chose the letters because they looked cool). MMX lets you do some of those SIMD things directly at the lowest level of the computer with the limitation that you can’t also be performing high-precision math at the same time.

AMD was competing with Intel. Not happy with the limitations of the MMX extension, they developed their own x86 extension “3DNow!” which performed the same operations, but without the limitations and with higher precision.

Intel retaliated with SSE: Streaming SIMD Extensions. They shipped it on their Pentium III processors starting in ’99. It wasn’t a full replacement for MMX, though, so they had to quickly follow it up in the Pentium 4.

Which finally brings us to SSE2. First released in 2001 in the Pentium 4 line of processors (also implemented by AMD two years later in their Opteron line), it reimplemented MMX’s capabilities without its shortcomings (and added some other capabilities at the same time).

So why am I talking ancient history? 2001 was fifteen years ago. What use do we have for this lesson on SSE2 when even SSE4 has been around since ’07, and AVX-512 will ship on real silicon within months?

Well, it turns out that Firefox doesn’t assume you have SSE2 on your computer. It can run on fifteen-year-old hardware, if you have it.

There are some code paths that benefit strongly from the ability to run the SIMD instructions present in SSE2. If Mozilla can’t assume that everyone running Firefox has a computer capable of running SSE2, Firefox has to detect, at runtime, whether the user’s computer is capable of using that fast path.

This makes Firefox bigger, slower, and harder to test and maintain.

A question came up on the dev-platform mailing list about how many Firefox users are actually running computers that lack SSE2. I live in a very rich country and have a very privileged job. Any assumption I make about who does and does not have the ability to run computers that are newer than fifteen years old is going to be clouded by biases I cannot completely account for.

So we turn to the data. Which means Telemetry. Which means I get to practice writing custom analyses. (Yay!)

It turns out that, if a Firefox user has Telemetry enabled, we ask that user’s computer about a lot of environmental information. What is your operating system? What version? How much RAM do you have installed? What graphics card do you have? What version is its driver?

And, yes: What extensions does your CPU support?

We collect this information to determine from real users machines whether a particular environmental variable makes Firefox behave poorly. In the not-too-distant past there was a version of Firefox that would just appear black. No reason, no recourse, no explanation. By examining environmental data we were able to track down what combination of graphics cards and driver versions were susceptible to this and develop a fix within days.

(If you want to see an application of this data yourself, here is a dashboard showing the population breakdown of Firefox users. You can use it to see how much of the Firefox user base is like you. For me, less than 1% of the user base was running a computer like mine with a Firefox like mine, reinforcing that what I might think makes sense may not exactly be representative of reality for Firefox users.)

So I asked of the data: of all the users reporting Telemetry on Thursday January 21, 2016, how many have SSE2 capability on their CPUs?

And the answer was: about 99.5%

This would suggest that at most 0.5% of the Firefox release population are running CPUs that do not have SSE2. This is not strictly correct (there are a variety of data science reasons why we cannot prove anything about the population that doesn’t report SSE2 capability), but it’s a decent approximation so let’s go with it.

From there, as with most Telemetry queries, there were more questions. The first was: “Are the users not reporting SSE2 support keeping themselves on older versions of Firefox?” This is a good question because, if the users are keeping themselves on old versions, we can enable SSE2 support in new versions and not worry about the users being unable to upgrade because they already chose not to.

Turns out, no. They’re not.

With such a small population we’re subdividing (0.5%) it’s hard to say anything for certain, but it appears as though they are mostly running up-to-date versions of Firefox and, thus, would be impacted by any code changes we release. Ah, well.

The next questions were: “We know SSE2 is required to run Windows 8. Are these users stuck on Windows XP? Are there many Linux users without SSE2?”

Turns out: yes and no. Yes, they almost all are on Windows XP. No, basically none of them are running Linux.

Support and security updates for Windows XP stopped on April 8, 2014. It probably falls under Mozilla’s mission to try and convince users still running XP to upgrade themselves if possible (as they did on Data Privacy Day), to improve the security of the Internet and to improve those users’ access to the Web.

If you are running Windows XP, or administer a family member’s computer who is, you should probably consider upgrading your operating system as soon as you are able.

If you are running an older computer and want to know if you might not have SSE2, you can open a Firefox tab to about:telemetry and check the Environment section. Under system should be a field “cpu.extensions” that will contain the token “hasSSE2” if Firefox has detected that you have SSE2.

(If about:telemetry is mostly blank, try clicking on the ‘Change’ links at the top labelled “FHR data upload is disabled” and “Extended Telemetry recording is disabled” and then restarting your Firefox)

SSE2 will probably be coming soon as a system requirement for Firefox. I hope all of our users are ready for when that happens.

:chutten

To-Order Telemetry Dashboards: dashboard-generator

Say you’ve been glued to my posts about Firefox Telemetry. You became intrigued by the questions you could answer and ask using actual data from actual users, and considered writing your own website using the single-API telemetry-wrapper.

However, you aren’t a web developer. You don’t like JavaScript. Or you’re busy. Or you don’t like reading READMEs on GitHub.

This is where dashboard-generator can step in to help out. Simply visit the website and build-your-own dash to your exacting specifications:

dashgenForBLog

Choose your channel, version, and metric. “-Latest-” will ensure that the generated dashboard will always use the latest version in the selected channel when you reload that page. Otherwise, you might find yourself always looking at GC_MS values from beta 39.

If you are only interested in clients reporting from a particular application, operating system, or with a certain E10s setting then make your choices in Filters.

If you want a histogram like telemetry.mozilla.org’s “Histogram Dashboard” then make sure you select Histogram and then choose if you want the ends of the histogram trimmed, whether (and how sensibly) you want to compare clients across particular settings, and whether to sanitize the results so you only use data that is valid and has a lot of samples.

If you want an evolution plot like telemetry.mozilla.org’s “Evolution Dashboard” then select Evolution. From there, choose whether to use the build date or submission date of samples, how many versions back from the selected one you would like to graph the values over, and whether to sanitize the results so you only use data that is valid and has a lot of samples.

Your choices made, click “Add to Dashboard”. Then choose again! And again!

Make a mistake? Don’t worry, you can remove rows using the ‘-‘ buttons.

Not sure what it’ll look like when you’re done? Hit ‘Generate Dashboard’ and you’ll get a preview in CodePen showing what it will look like and giving you an opportunity to fiddle with the HTML, CSS, and JS.

codepenForBlog

When you see something you like in the CodePen, hit ‘Save’ and it’ll give you a URL you can use to collaborate with others, and an option to ‘Export’ the whole site for when you want to self-host.

If you find any bugs or have any requests, please file an issue ticket here. I’ll be using it to write an E10s dashboard in the near term, and hope you’ll use it, too!

:chutten