When do All Firefox Users Update?

Last time we talked about updates I wrote about all of what goes into an individual Firefox user’s ability to update to a new release. We looked into how often Firefox checks for updates, and how we sometimes lie and say that there isn’t an update even after release day.

But how does that translate to a population?

Well, let’s look at some pictures. First, the number of “update” pings we received from users during the recent Firefox 61 release:update_ping_volume_release61

This is a real-time look at how many updates were received or installed by Firefox users each minute. There is a plateau’s edge on June 26th shortly after the update went live (around 10am PDT), and then a drop almost exactly 24 hours later when we turned updates off. This plot isn’t the best for looking into the mechanics of how this works since it shows volume of all types of “update” pings from all versions, so it includes users finally installing Firefox Quantum 57 from last November as well as users being granted the fresh update for Firefox 61.

Now that it’s been a week we can look at our derived dataset of “update” pings and get a more nuanced view (actually, latency on this dataset is much lower than a week, but it’s been at least a week). First, here’s the same graph, but filtered to look at only clients who are letting us know they have the Firefox 61 update (“update” ping, reason: “ready”) or they have already updated and are running Firefox 61 for the first time after update (“update” ping, reason: “success”):update_ping_volume_61only_wide

First thing to notice is how closely the two graphs line up. This shows how, during an update, the volume of “update” pings is dominated by those users who are updating to the most recent version.

And it’s also nice validation that we’re looking at the same data historically that we were in real time.

To step into the mechanics, let’s break the graph into its two constituent parts: the users reporting that they’ve received the update (reason: “ready”) and the users reporting that they’re now running the updated Firefox (reason: “success”).

update_ping_volume_stackedupdate_ping_volume_unstacked

The first graph shows the two lines stacked for maximum similarity to the graphs above. The second unstacks the two so we can examine them individually.

It is now much clearer to see how and when we turned updates on and off during this release. We turned them on June 26, off June 27, then on again June 28. The blue line also shows us some other features: the Canada Day weekend of lower activity June 30 and July 1, and even time-of-day effects where our sharpest peaks are (EDT) 6-8am, 12-2pm, and a noticeable hook at 1am.

(( That first peak is mostly made up of countries in the Central European Timezone UTC+1 (e.g. Germany, France, Poland). Central Europe’s peak is so sharp because Firefox users, like the populations of European countries, are concentrated mostly in that one timezone. The second peak is North and South America (e.g. United States, Brazil). It is broader because of how many timezones the Americas span (6) and how populations are dense in the Eastern Timezone and Pacific Timezone which are 3 hours apart (UTC-5 to UTC-8). The noticeable hook is China and Indonesia. China has one timezone for its entire population (UTC+8), and Indonesia has three. ))

This blue line shows us how far we got in the last post: delivering the updates to the user.

The red line shows why that’s only part of the story, and why we need to look at populations in addition to just an individual user.

Ultimately we want to know how quickly we can reach our users with updated code. We want, on release day, to get our fancy new bells and whistles out to a population of users small enough that if something goes wrong we’re impacting the fewest number of them, but big enough that we can consider their experience representative of what the whole Firefox user population would experience were we to release to all of them.

To do this we have a two levers: we can change how frequently Firefox asks for updates, and we can change how many Firefox installs that ask for updates actually get it right away. That’s about it.

So what happens if we change Firefox to check for updates every 6 hours instead of every 12? Well, that would ensure more users will check for updates during periods they’re offered. It would also increase the likelihood of a given user being offered the update when their Firefox asks for one. It would raise the blue line a bit in those first hours.

What if we change the %ge of update requests that are offers? We could tune up or down the number of users who are offered the updates. That would also raise the blue line in those first hours. We could offer the update to more users faster.

But neither of these things would necessarily increase the speed at which we hit that Goldilocks number of users that is both big enough to be representative, and small enough to be prudent. Why? The red line is why. There is a delay between a Firefox having an update and a user restarting it to take advantage of it.

Users who have an update won’t install it immediately (see how the red line is lower than the blue except when we turn updates off completely), and even if we turn updates off it doesn’t stop users who have the update from installing it (see how the red line continues even when the blue line is floored).

Even with our current practices of serving updates, more users receive the update than install it for at least the first week after release.

If we want to accelerate users getting update code, we need to control the red line. Which we don’t. And likely can’t.

I mean, we can try. When an update has been pending for long enough, you get a little arrow on your Firefox menu: update_arrow.png

If you leave it even longer, we provide a larger piece of UI: a doorhanger.

restartDoorhanger

We can tune how long it takes to show these. We can show the doorhanger immediately when the update is ready, asking the user to stop what they’re doing and–

windows_update_bluewindows_updatewindows_update_blue_lg

…maybe we should just wait until users update their own selves, and just offer some mild encouragement if they’ve put it off for, say, four days? We’ll give the user eight days before we show them anything as invasive as the doorhanger. And if they dismiss that doorhanger, we’ll just not show it again and trust they’ll (eventually) restart their browser.

…if only because Windows restarted their whole machines when they weren’t looking.

(( If you want your updates faster and more frequent, may I suggest installing Firefox Beta (updates every week) or Firefox Nightly (updates twice a day)? ))

This means that if the question is “When do all Firefox users update?” the answer is, essentially, “When they want to.” We can try and serve the updates faster, or try to encourage them to restart their browsers sooner… but when all is said and done our users will restart their browsers when they want to restart their browsers, and not a moment before.

Maybe in the future we’ll be able to smoothly update Firefox when we detect the user isn’t using it and when all the information they have entered can be restored. Maybe in the future we’ll be able to update seamlessly by porting the running state from instance to instance. But for now, given our current capabilities, we don’t want to dictate when a user has to restart their browser.

…but what if we could ship new features more quickly to an even more controlled segment of the release population… and do it without restarting the user’s browser? Wouldn’t that be even better than updates?

Well, we might have answers for that… but that’s a subject for another time.

:chutten

Advertisements

Faster Event Telemetry with “event” Pings

Screenshot_2018-07-04 New Query(1).pngEvent Telemetry is the means by which we can send ordered interaction data from Firefox users back to Mozilla where we can use it to make product decisions.

For example, we know from a histogram that the most popular way of opening the Developer Tools in Firefox Beta 62 is by the shortcut key (Ctrl+Shift+I). And it’s nice to see that the number of times the Javascript Debugger was opened was roughly 1/10th of the number of times the shortcut key was used.

…but are these connected? If so, how?

And the Javascript Profiler is opened only half as often as the Debugger. Why? Isn’t it easy to find that panel from the Debugger? Are users going there directly from the DOM view or is it easier to find from the Debugger?

To determine what parts of Firefox our users are having trouble finding or using, we often need to know the order things happen. That’s where Event Telemetry comes into play: we timestamp things that happen all over the browser so we can see what happens and in what order (and a little bit of how long it took to happen).

Event Telemetry isn’t new: it’s been around for about 2 years now. And for those two years it has been piggy-backing on the workhorse of the Firefox Telemetry system: the “main” ping.

The “main” ping carries a lot of information and is usually sent once per time you close your Firefox (or once per day, whichever is shorter). As such, Event Telemetry was constrained in how it was able to report this ordered data. It takes two whole days to get 95% of it (because that’s how long it takes us to get “main” pings), and it isn’t allowed to send more than one thousand events per process (lest it balloon the size of the “main” ping, causing problems).

This makes the data slow, and possibly incomplete.

With the landing of bug 1460595 in Firefox Nightly 63 last week, Event Telemetry now has its own ping: the “event” ping.

The “event” ping maintains the same 1000-events-per-process-per-ping limit as the “main” ping, but can send pings as frequently as one ping every ten minutes. Typically, though, it waits the full hour before sending as there isn’t any rush. A maximum delay of an hour still makes for low-latency data, and a minimum delay of ten minutes is unlikely to be overrun by event recordings which means we should get all of the events.

This means it takes less time to receive data that is more likely to be complete. This in turn means we can use less of it to get our answers. And it means more efficiency in our decision-making process, which is important when you’re competing against giants.

If you use Event Telemetry to answer your questions with data, now you can look forward to being able to do so faster and with less worry about losing data along the way.

And if you don’t use Event Telemetry to answer your questions, maybe now would be a good time to start.

The “event” ping landed in Firefox Nightly 63 (build id 20180627100027) and I hope to have it uplifted to Firefox Beta 62 in the coming days.

Thanks to :sunahsuh for her excellent work reviewing the proposal and in getting the data into the derived datasets so they can be easily queried, and further thanks to the Data Team for their support.

:chutten