Firefox User Engagement

I now know, which is to say that I can state with some degree of certainty, that Windows XP Firefox users are only a little less engaged with their Firefox installations as the Firefox user base as a whole.

To get here I needed to sanitize dates with more than four digits in their years (paging the Long Now Foundation: someone’s reporting Telemetry from the year 29634), with timezones not of this Earth (Wikipedia seems to think UTC+14:00 is the highest time zone, but I have seen data from +14:30), and with clones reporting the same session over and over again from different locations (this one might be because a user’s client_id is stored in their profile. If that profile is reused, then we will get data reported from all the locations that use that profile).

I also needed a rigorous definition of what it means for a user population to be “engaged”.

We chose to define an Engagement Ratio for a given day which is basically the number of people who ran Firefox that day divided by the number of people who ran Firefox the previous month. In other words: what proportion of users who could possibly be active actually were active on that day?

Of course, if you read my previous post, you know that the day of the week you choose is going to change your result dramatically. So instead of counting each user who used Firefox that exact day, we average it out over the previous week: if a user was active each day, they’re a full Daily Active User (DAU). If they’re active only one day, they’re 1/7 of a Daily Active User.

To see this in action, I chose to measure Engagement on March the 10th of this year, which was a Thursday. The number of users who reported data to the 1% Longitudinal Dataset who were active that day was 1,119,335. If we use instead the average of daily active users for the week ending on March the 10, we get 1,051,011.86 which is lower by over 6%. This is consistent with data we’ve collected in other studies that showed only 20% of Firefox users use it 7 days a week. Another 25% use it only on weekdays. So it makes sense that a weekday’s DAU count would be higher than a weekend day’s.

If you’ve ever complained about having to remember how many days there are in a month, you know that the choice of “month” is going to change things as well. So in this case, we just chose 28 days: four weeks. That there is no month that is always 28 days long (lookin’ at you, Leap Years) is irrelevant because we’re selecting values to make things easier for ourselves. So if a user was active on any of the previous 28 days, they are a Monthly Active User (MAU).

So you count your DAU and you divide it by your MAU count and you get your Engagement Ratio (ER) whose units are… unitless. Divide users by users and you get… something that’s almost a percent, in that it’s a value from 0 to 1 that represents a proportion of  a population.

This number we can track over time. We expect it to trend downwards when we ship something that negatively impacts users. We expect it to trend upwards when we ship something that users like.

So long as we can get this data quickly enough and reliably enough, we can start determining from the numbers (the silent majority), not noisy users (the vocal minority), what issues actually matter to the user base at large.

:chutten

 

Data Science is Hard – Part 1: Data

You’d think that categorizing and measuring populations would be pretty simple. You count them all up, divide them into groups… simple, arithmetic-sounding stuff.

To a certain extent, that’s all it is. You want to know how many people contribute to Firefox in a day? Add ’em up. Want to know what fraction of them are from Europe? Compare the subset from Europe against the entire population.

But that’s where it gets squishy:

  • “in a day?” Which day? Did you choose a weekend day? A statutory holiday? A religious holiday? That’ll change the data. Which 24 hours are you counting? From midnight-to-midnight, sure, but which timezone?
  • “from Europe?” What is Europe? Just the EU? How do you tell if a contributor is from Europe? Are you running a geolocation query against their IP? What if their IP changes over the day, are we going to double-count that user? Are we asking contributors where they are from? What if they lie?

So that leads us to Part 1 of “Data Science is Hard”: Data is Hard.

In a recent 1-on-1, my manager :bsmedberg and I thought that it could be interesting to look into Firefox users whose Telemetry reports come from different parts of the world at different times. Maybe we could identify users who travel (Firefox Users Who Travel: Where do they travel to/from?). Maybe they can help us understand the differing needs of Firefox users who are on vacation as opposed to being at home. Maybe they’ll show us Tor Browser users, or users using other anonymizing techniques and technologies: and maybe we should see if there’s some special handling we could provide for them and their data.

I used this topic as a way to learn how to use our new re:dash dashboard onto the prestodb instance of the Longitudinal Dataset. (which lets me run SQL queries against a 1% random sample of Firefox users’ Telemetry data from the past 180 days)

Immediately I ran into problems. First, with remembering all the SQL I had forgotten in the *mumblesomething* years since I last had to write interesting queries.

But then I quickly ran into problems with the data. I ran a query to boil down how many (and which) unique countries each client had reported Telemetry from:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , array_distinct(geo_country) AS countries
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 5
Country_count Countries
35 [“CN”,”MX”,”GB”,”HU”,”JP”,”US”,”RU”,”IN”,”HK”,”??”,”CA”,”KR”,”TW”,”CM”,”DK”,”CH”,”ZA”,”PH”,”DE”,”VN”,”NL”,”CO”,”KZ”,”MA”,”TR”,”FR”,”AU”,”GR”,”IE”,”AR”,”BY”,”AT”,”TN”,”BR”,”AM”]
34 [“DE”,”RU”,”LT”,”UA”,”MA”,”GB”,”GI”,”AE”,”FR”,”CN”,”AM”,”NG”,”NL”,”PT”,”TH”,”PL”,”ES”,”NO”,”CH”,”IL”,”ZA”,”BY”,”US”,”UZ”,”HK”,”TW”,”JP”,”PK”,”LU”,”SG”,”FI”,”EU”,”IN”,”ID”]
34 [“US”,”BR”,”KR”,”NZ”,”RO”,”JP”,”ES”,”GB”,”TW”,”CN”,”UA”,”AU”,”NL”,”FR”,”FI”,”??”,”NO”,”CA”,”ZA”,”CL”,”IT”,”SE”,”SG”,”CH”,”RU”,”DE”,”MY”,”IN”,”ID”,”VN”,”PL”,”PH”,”KE”,”EG”]
34 [“GB”,”CN”,”??”,”DE”,”US”,”RU”,”AL”,”ES”,”NL”,”FR”,”KR”,”FI”,”IR”,”CA”,”JP”,”HK”,”AU”,”CH”,”RO”,”CO”,”IE”,”BR”,”SE”,”GR”,”IN”,”MX”,”RS”,”AR”,”TW”,”IT”,”SA”,”ID”,”VN”,”TN”]
34 [“US”,”GI”,”??”,”GB”,”DE”,”SA”,”KR”,”AR”,”ZA”,”CN”,”IN”,”AT”,”CA”,”KE”,”IQ”,”VN”,”TR”,”KZ”,”JP”,”BR”,”FR”,”TW”,”IT”,”ID”,”SG”,”RU”,”CL”,”BA”,”NL”,”AU”,”BE”,”LT”,”PT”,”ES”]

35 unique countries visited? Wow.

The “Countries” column is in order of when they first appeared in the data, so we know that the first user was reporting from China then Mexico then Great Britain then Hungary then Japan then the US then Russia…

Either this is a globetrotting super spy, or we’re looking at some sort of VPN/Tor/anonymizing framework at play here.

( Either way I think it best to say, “Thank you for using Firefox, Ms. Super Spy!” )

Or maybe this is a sign that the geolocation service is unreliable, or that the data intake services are buggy, or something else that would be less than awesome.

Regardless: this data is hugely messy. But, 35 countries over 180 days? That’s just about doable in real life… except that it wasn’t over 180 days, but 2:

SELECT
    cardinality(array_distinct(geo_country)) AS country_count
    , cardinality(geo_country) AS subsession_count
    , cardinality(geo_country) / (date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date))) + 1) AS subsessions_per_day
    , date_diff('DAY', from_iso8601_timestamp(array_min(subsession_start_date)), from_iso8601_timestamp(array_max(subsession_start_date)) + 1) AS duration
FROM longitudinal_v20160314
ORDER BY country_count DESC
LIMIT 1
Country_count Subsession_count Subsessions_per_day Duration
35 169 84 2

This client reported from 35 countries over 2 days. At least 17 countries per day (we’re skipping duplicates).

Also of note to Telemetry devs, this client was reporting 84 subsessions per day.

(Subsessions happen at a user’s local midnight and whenever some aspect of the Environment block of Telemetry changes (your locale, your multiprocess setting, how many addons you have installed). If your Firefox is registering that many subsession edges per day, there might be something wrong with your install. Or there might be something wrong with our data intake or aggregation.)

I still plan on poking around this idea of Firefox Users Who Travel. As I do so I need to remember that the data we collect is only useful for looking at Populations. Knowing that there’s one user visiting 35 countries in 2 days doesn’t help us decide whether or not we should release a special Globetrotter Edition of Firefox… since that’s just 1 of 4 million clients of a dataset representing only 1% of Firefox users.

Knowing that about a dozen users reported days with over 250 subsessions might result in some evaluation of that code, but without something linking these high-subsession-rate users together into a Population (maybe they’re machines running automated testing?), there’s nothing much we can do about it.

Instead I should focus on how, in a 4M user dataset, 112k (2.7%) users report from exactly 2 countries over the duration of the dataset. There are only 44k that report from more than 2, and the other 3.9M or so report exactly 1.

2.7% is a sliver of 1% of the Firefox population, but it is a Population. A Population is something we can analyse and speak meaningfully about, as the noise and mess of individual points of data has been smoothed out by the sheer weight of the Firefox user base.

It’s nice having a user base large enough to speak meaningfully about.

:chutten

Mozilla, Firefox, and Windows XP Support

windowsXPStartButton
Used with permission from Microsoft.

Last time I focused on what the Firefox Windows XP user population appeared to be. This time, I’m going to look into what such a large population means to Firefox and Mozilla.

Windows XP users of Firefox are geographically and linguistically diverse, and make up more than one tenth of the entire Firefox user population. Which is great, right? A large, diverse population of users… other open source projects only wish they had the luck.

Except Windows XP is past its end-of-life. Nearly two years past. This means it hasn’t been updated, maintained, serviced, or secured in longer than it takes Mars to make it around the Sun.

The Internet can be a scary place. There are people who want to steal your banking passwords, post your private pictures online, or otherwise crack open your computer and take and do what they want with what’s inside of it.

Generally, this is an arms race. Every time someone discovers a software vulnerability, software vendors rush to fix it before the Bad Guys can use it to exploit people’s computers.

The reason we feel safe enough to continue our modern life using computers for our banking, shopping, and communicating is because software vendors are typically better at this than the Bad Guys.

But what if you’re using Windows XP? Microsoft, the only software vendor who is permitted to fix vulnerabilities in Windows XP, has stopped fixing them.

This means each Windows XP vulnerability that is found remains exploitable. Forever.

These are just a few vulnerabilities that we know about.

And Windows XP isn’t just bad for Windows XP users.

There are a variety of crimes that can be committed only using large networks of “robot” machines (called “botnets“) under the control of a single Bad Guy. Machines can be recruited into botnets against their users’ will through security vulnerabilities in the software they are running. Windows XP’s popularity and lengthening list of known vulnerabilities might make it an excellent source of recruits.

With enough members, a botnet can then send spam emails in sufficient volume to overload mail servers, attack financial institutions, steal information from governmental agencies, and otherwise make the Internet a less nice place to be.

So Firefox has a large, diverse population of users whose very presence connected to the Internet is damaging the Web for us all.

And so does Google! At least for now. Google has announced that it will end Windows XP support for its Chrome browser in April 2016. (It previously announced end-of-life dates for April 2015, and then December 2015.)

So, as of April, Windows XP users will have only one choice for updated, serviced, maintained, and secured web browsing: Firefox.

Which puts Mozilla in a bit of a bind. The Internet is a global, public resource that Mozilla is committed to defend and improve.

Does improving the Internet mean dropping support for Windows XP so that users have no choice but to upgrade to be able to browse safely?

Or does improving the Internet mean continuing to support Windows XP so that those can at least still have a safe browser to access the Web?

Windows XP users might not have a choice in what Operating System their computers run. They might only be using it because they don’t know of an alternative or because they can’t afford to, aren’t allowed to, or are afraid to change.

Firefox is their best hope for security on the Web. And, after April, their only hope.

As of this writing, Firefox supports versions of Windows from XP SP2 on upwards. And this is likely to continue: the latest public discussion about Windows XP support was from last December, reacting to the latest of Google’s Windows XP support blog posts.

I can reiterate confidently: Firefox will continue to support Windows XP.

For now.

Mozilla will have to find a way to reconcile this with its mission. And Firefox will have to help.

Maybe Mozillians from around the world can seek out Windows XP users and put them in contact with local operations that can donate hardware or software or even just their time to help these users connect to the Internet safely and securely.

Maybe Firefox will start testing nastygrams to pop up at our Windows XP user base when they start their next browsing session: “Did you know that your operating system is older than many Mozillians?”, “It appears as though you are accessing the Internet using a close relative of the abacus”, “We have determined that this is the email address of your closest Linux User Group who can help you secure your computer”

…and you. Yeah, you. Do you know someone who might be running Windows XP? Maybe it’s your brother, or your Mother, or your Babbi. If you see a computer with a button that says “Start” at the bottom-left corner of the screen: you can fix that. There are resources in your community that can help.

Talk to your librarian, talk to the High School computer teacher, talk to a Mozillian! We’re friendly!

Together, we can save the Web.

:chutten

Windows XP Firefox Users

Speaking of fourteen-year-old technology: Windows XP.

88195-lojmr

Ah, Windows XP. The most-used Operating System until August, 2012. Released in October of 2001, it was an incredibly-popular upgrade to the previous champion, Windows 98 Second Edition.

It was so popular that Microsoft kept sending it security updates and assorted bugfixes up to April 8, 2014. That’s nearly twelve and a half years. That’s four and a half years after XPs true successor, Windows 7, was released. (We do not speak of Vista)

Windows 7 will end-of-life after only 10 years, in 2020. Windows 8 lost it after 3 (if you include 8.1, it will only last 9).

Why am I talking about it, almost two full years after even Microsoft stopped supporting Windows XP: the most long-supported of OSes from Redmond? Because people are still using it. And not just any people: Firefox users.

Roughly 12% of Firefox users on release and beta channels are running Windows XP. That’s almost as many as are using Windows 10 (15%) and almost double how many users we have on Mac (6.7%).

(For the record, Linux users on these two channels make up less than 1%)

[edit – oh, and please remember that the usual rules of Data Science apply: I’m only able to analyse what is being provided. So if a disproportionate number of Firefox users on, say, Linux aren’t reporting Telemetry on release or beta channels, they will be undercounted in the presented numbers]

“But… Who? I don’t know anyone who is using an operating system two years past its end-of-life.” you might be thinking. Well, this is called an “inherent bias”: just because it doesn’t happen to you doesn’t mean it isn’t happening.

Intrigued by this result (as I too know no one still running XP) I tried to find out what was different about the XP population as compared to the wider Firefox user base, and what was the same. I used a longitudinal dataset that we’re experimenting with internally, which is why all of my figures aren’t linked to analyses.

It might interest you to know that Windows XP users are more likely to have configured their Firefox to run in the Russian, Polish, or Spanish locales. They are less likely to have configured it to use the English or German locales.

Windows XP users are less centrally-located than other users. According to geo-ip lookups of users’ IP addresses when they submit Firefox Telemetry reports, nearly 18% of the Firefox user base is in the United States. That is a great degree of centralization and means Mozilla can get great “bang for its buck” with outreach programs that operate in the United States. A Windows XP Firefox user is far less likely to be in the United States or Germany, and is slightly less likely to be in Great Britain, France, or Japan. Instead, a Windows XP Firefox user is somewhat more likely to be in Russia, Poland, India, the Ukraine, Egypt, Spain, or Italy.

How engaged are Windows XP Firefox users? Maybe they only have it installed and running, but don’t actually use it much day-to-day.

A good predictor of engagement is having Firefox set as the default browser on a system, so a lower proportion of “default browser: Yes” for Windows XP users might signal lower engagement. However, the data shows that Windows XP Firefox users are more likely to have Firefox set as their system’s default browser (even accounting for how difficult it now is in Windows 10 to set non-Edge browsers to be the system default).

Another good predictor of engagement is having addons installed. Addons might also be a good signifier that a user chose Firefox because it offers, with its addons, a compelling feature that not just any other browser can match. Data says: Windows XP Firefox users are much more likely to have 0 addons installed than the Firefox users in general, so maybe the choice of browser was made for them? Maybe they don’t know about addons at all.

All these differences might make it easy to divide the user base into “us” and “them”. “They” are running an outdated OS. “They” make hardware acceleration development harder in Firefox.

But we are all similar in more respects than you realize. We are all just as likely to live in Canada, Mexico, Indonesia, Belgium, Portugal, South Africa, or any of the other dozens and dozens of countries where Firefox users live. We are all just as likely to be running an up-to-date version of release Firefox. We are all as likely to have a truly-ridiculous number of addons installed (13 or more).

And, most importantly, we all use Firefox to access the global resource that is the Web.

:chutten

Edit – I don’t have data to back up my assertion that “A good predictor of engagement is having Firefox set as the default browser on a system” and :dolske told me about an experiment Firefox ran where we just didn’t ask users to set us as their default browser. In the experiment, user retention and active hours did not decrease. Desktop users’ engagement is apparently unaffected by what browser is opened if you click on a link in another program.