Latency Improvements, or, Yet Another Satisfying Graph

This is the third in my ongoing series of posts containing satisfying graphs.

Today’s feature: a plot of the mean and 95th percentile submission delays of “main” pings received by Firefox Telemetry from users running Firefox Beta.

Screenshot-2017-7-12 Beta _Main_ Ping Submission Delay in hours (mean, 95th %ile)

We went from receiving 95% of pings after about, say, 130 hours (or 5.5 days) down to getting them within about 55 hours (2 days and change). And the numbers will continue to fall as more beta users get the modern beta builds with lower latency ping sending thanks to pingsender.

What does this mean? This means that you should no longer have to wait a week to get a decently-rigorous count of data that comes in via “main” pings (which is most of our data). Instead, you only have to wait a couple of days.

Some teams were using the rule-of-thumb of ten (10) days before counting anything that came in from “main” pings. We should be able to reduce that significantly.

How significantly? Time, and data, will tell. This quarter I’m looking into what guarantees we might be able to extend about our data quality, which includes timeliness… so stay tuned.

For a more rigorous take on this, partake in any of dexter’s recent reports on RTMO. He’s been tracking the latency improvements and possible increases in duplicate ping rates as these changes have ridden the trains towards release. He’s blogged about it if you want all the rigour but none of Python.

:chutten

FINE PRINT: Yes, due to how these graphs work they will always look better towards the end because the really delayed stuff hasn’t reached us yet. However, even by the standards of the pre-pingsender mean and 95th percentiles we are far enough after the massive improvement for it to be exceedingly unlikely to change much as more data is received. By the post-pingsender standards, it is almost impossible. So there.

FINER PRINT: These figures include adjustments for client clocks having skewed relative to server clocks. Time is a really hard problem when even on a single computer and trying to reconcile it between many computers separated by oceans both literal and metaphorical is the subject of several dissertations and, likely, therapy sessions. As I mentioned above, for rigour and detail about this and other aspects, see RTMO.

A Firefox Telemetry Introduction

Telemetry is Firefox’s way of sending anonymous analytics data back to Mozilla to help improve Firefox. If you run Firefox and are up-to-date as of this blogpost, you probably send at least bare-minimum telemetry to Mozilla fairly regularly.

Thank you!

This data is important to figure out the size and shape of the userbase, and what sorts of issues might be happening. You can see all the telemetry from your Firefox by visiting about:telemetry.

Since this is Mozilla we’re talking about, the information collected this way is available for you (yes, you!) to run analyses on. For instance, here is a histogram showing distributions of Firefox Desktop 42 “first paint” time¬†compared by Operating System for Windows, Mac, and Linux.

plot showing Firefox first paint measures, compared by operating system. Windows and Mac have similar distributions, but Linux trends to longer durations before first paint.

We can see that maybe there’s something we could be doing to make Linux startup speed faster on that release, as a largish part of its¬†histogram is shifted right into the higher values.

But how has startup speed been trending? Here is an evolution plot showing median startup time over the past four Firefox Desktop beta versions on Windows.

plot showing the evolution of Firefox for Windows first paint times over four beta releases. There is a noticeable downward trend.

The general trend has been downward (faster startup? Excellent.) However, it might be a bit slower in the latest rev (Grr. We need to watch this.) This evolution plot shows progress is fairly flat through beta releases, which is what we’d expect based on how stable the builds are that reach that step.

Now, if we graph the same plot by the date the browser submitted the telemetry, instead of the date the browser was built, we see something very interesting indeed.

an evolution plot showing Firefox for Windows first paint times. There are peaks every Saturday where first paint is slower.

What are these peaks? They happen every week on the same day… how often is Beta updated again? (Correlation is not causation, so maybe there’s another reason for the reliable frequency of those distributions.)

Right now I will be working on increasing the relevance and usefulness of the telemetry data you have so kindly provided. My next task will be to determine whether the new multi-process Firefox feature (“Electrolysis” or “e10s”) causes Firefox to crash any more often if someone has Firefox’s accessibility features turned on. This will be an important measure to determine when this feature will be able to be shipped in a later Firefox release.

If you’re interested, there is a lot more material you can read about how to use the various telemetry tools, how Mozilla uses the data that comes out of it, and how you can help to make it better.

:chutten