Data Science is Hard: Counting Users

Screenshot_2018-08-29 User Activity Firefox Public Data Report

Counting is harder than you think. No, really!

Intuitively, as you look around you, you think this can’t be true. If you see a parking lot you can count the cars, right?

But do cars that have left the parking lot count? What about cars driving through it without stopping? What about cars driving through looking for a space? (And can you tell the difference between those two kinds from a distance?)

These cars all count if you’re interested in usage. It’s all well and good to know the number of cars using your parking lot right now… but is it lower on weekends? Holidays? Are you measuring on a rainy day when fewer people take bicycles, or in the Summer when more people are on vacation? Do you need better signs or more amenities to get more drivers to stop? Are you going to have expand capacity this year, or next?

Yesterday we released the Firefox Public Data Report. Go take a look! It is the culmination of months of work of many mozillians (not me, I only contributed some early bug reports). In it you can find out how many users Firefox has, the most popular addons, and how quickly Firefox users update to the latest version. And you can choose whether to look at how these plots look for the worldwide user base or for one of the top ten (by number of Firefox users) countries individually.

It’s really cool.

The first two plots are a little strange, though. They count the number of Firefox users over time… and they don’t agree. They don’t even come close!

For the week including August 17, 2018 the Yearly Active User (YAU) count is 861884770 (or about 862M)… but the Monthly Active User (MAU) count is 256092920 (or about 256M)!

That’s over 600M difference! Which one is right?

Well, they both are.

Returning to our parking lot analogy, MAU is about counting how many cars use the parking lot over a 28-day period. So, starting Feb 1, count cars. If someone you saw earlier returns the next day or after a week, don’t count them again: we only want unique cars. Then, at the end of the 28-day period, that was the MAU for Feb 28. The MAU for Mar 1 (on non-leap-years) is the same thing, but you start counting on Feb 2.

Similarly for YAU, but you count over the past 365 days.

It stands to reason that you’ll see more unique cars over the year than you will over the month: you’ll see visitors, tourists, people using the lot just once, and people who have changed jobs and haven’t been back in four months.

So how many of these 600M who are in the YAU but not in the MAU are gone forever? How many are coming back? We don’t know.

Well, we don’t know _precisely_.

We’ve been at the browser game for long enough to see patterns in the data. We’re in the Summer slump for MAU numbers, and we have a model for how much higher the numbers are likely to be come October. We have surveyed people of varied backgrounds and have some ideas of why people change browsers to or away from Firefox.

We have the no-longer users, the lapsed users, the lost-and-regained users, the tried-us-once users, the non-human users, … we have categories and rough proportions on what we think we know about our population, and how that influences how we can better make the internet better for them.

Ultimately, to me, it doesn’t matter too much. I work on Firefox, a product that hundreds of millions of people use. How many hundreds of millions doesn’t matter: we’re above the threshold that makes me feel like I’m making the world better.

(( Well… I say that, but it is actually my job to understand the mechanisms behind these  numbers and why they can’t be exact, so I do have a bit of a vested interest. And there are a myriad of technological and behavioural considerations to account for in code and in documentation and in analysis which makes it an interesting job. But, you know. Hundreds of millions is precise enough for my job satisfaction index. ))

But once again we reach the inescapable return to the central thesis. Counting is harder than you think: one of the leading candidates for the Data Team’s motto. (Others include “Well, it depends.” and “¯\_(ツ)_/¯”). And now we’re counting in the open, so you get to experience its difficulty firsthand. Go have another look.