This Week in Glean: Data Reviews are Important, Glean Parser makes them Easy

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.) All “This Week in Glean” blog posts are listed in the TWiG index).

At Mozilla we put a lot of stock in Openness. Source? Open. Bug tracker? Open. Discussion Forums (Fora?)? Open (synchronous and asynchronous).

We also have an open process for determining if a new or expanded data collection in a Mozilla project is in line with our Privacy Principles and Policies: Data Review.

Basically, when a new piece of instrumentation is put up for code review (or before, or after), the instrumentor fills out a form and asks a volunteer Data Steward to review it. If the instrumentation (as explained in the filled-in form) is obviously in line with our privacy commitments to our users, the Data Steward gives it the go-ahead to ship.

(If it isn’t _obviously_ okay then we kick it up to our Trust Team to make the decision. They sit next to Legal, in case you need to find them.)

The Data Review Process and its forms are very generic. They’re designed to work for any instrumentation (tab count, bytes transferred, theme colour) being added to any project (Firefox Desktop, mozilla.org, Focus) and being collected by any data collection system (Firefox Telemetry, Crash Reporter, Glean). This is great for the process as it means we can use it and rely on it anywhere.

It isn’t so great for users _of_ the process. If you only ever write Data Reviews for one system, you’ll find yourself answering the same questions with the same answers every time.

And Glean makes this worse (better?) by including in its metrics definitions almost every piece of information you need in order to answer the review. So now you get to write the answers first in YAML and then in English during Data Review.

But no more! Introducing glean_parser data-review and mach data-review: command-line tools that will generate for you a Data Review Request skeleton with all the easy parts filled in. It works like this:

  1. Write your instrumentation, providing full information in the metrics definition.
  2. Call python -m glean_parser data-review <bug_number> <list of metrics.yaml files> (or mach data-review <bug_number> if you’re adding the instrumentation to Firefox Desktop).
  3. glean_parser will parse the metrics definitions files, pull out only the definitions that were added or changed in <bug_number>, and then output a partially-filled-out form for you.

Here’s an example. Say I’m working on bug 1664461 and add a new piece of instrumentation to Firefox Desktop:

fog.ipc:
  replay_failures:
    type: counter
    description: |
      The number of times the ipc buffer failed to be replayed in the
      parent process.
    bugs:
      - https://bugzilla.mozilla.org/show_bug.cgi?id=1664461
    data_reviews:
      - https://bugzilla.mozilla.org/show_bug.cgi?id=1664461
    data_sensitivity:
      - technical
    notification_emails:
      - chutten@mozilla.com
      - glean-team@mozilla.com
    expires: never

I’m sure to fill in the `bugs` field correctly (because that’s important on its own _and_ it’s what glean_parser data-review uses to find which data I added), and have categorized the data_sensitivity. I also included a helpful description. (The data_reviews field currently points at the bug I’ll attach the Data Review Request for. I’d better remember to come back before I land this code and update it to point at the specific comment…)

Then I can simply use mach data-review 1664461 and it spits out:

!! Reminder: it is your responsibility to complete and check the correctness of
!! this automatically-generated request skeleton before requesting Data
!! Collection Review. See https://wiki.mozilla.org/Data_Collection for details.

DATA REVIEW REQUEST
1. What questions will you answer with this data?

TODO: Fill this in.

2. Why does Mozilla need to answer these questions? Are there benefits for users?
   Do we need this information to address product or business requirements?

TODO: Fill this in.

3. What alternative methods did you consider to answer these questions?
   Why were they not sufficient?

TODO: Fill this in.

4. Can current instrumentation answer these questions?

TODO: Fill this in.

5. List all proposed measurements and indicate the category of data collection for each
   measurement, using the Firefox data collection categories found on the Mozilla wiki.

Measurement Name | Measurement Description | Data Collection Category | Tracking Bug
---------------- | ----------------------- | ------------------------ | ------------
fog_ipc.replay_failures | The number of times the ipc buffer failed to be replayed in the parent process.  | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1664461


6. Please provide a link to the documentation for this data collection which
   describes the ultimate data set in a public, complete, and accurate way.

This collection is Glean so is documented
[in the Glean Dictionary](https://dictionary.telemetry.mozilla.org).

7. How long will this data be collected?

This collection will be collected permanently.
**TODO: identify at least one individual here** will be responsible for the permanent collections.

8. What populations will you measure?

All channels, countries, and locales. No filters.

9. If this data collection is default on, what is the opt-out mechanism for users?

These collections are Glean. The opt-out can be found in the product's preferences.

10. Please provide a general description of how you will analyze this data.

TODO: Fill this in.

11. Where do you intend to share the results of your analysis?

TODO: Fill this in.

12. Is there a third-party tool (i.e. not Telemetry) that you
    are proposing to use for this data collection?

No.

As you can see, this Data Review Request skeleton comes partially filled out. Everything you previously had to mechanically fill out has been done for you, leaving you more time to focus on only the interesting questions like “Why do we need this?” and “How are you going to use it?”.

Also, this saves you from having to remember the URL to the Data Review Request Form Template each time you need it. We’ve got you covered.

And since this is part of Glean, this means this is already available to every project you can see here. This isn’t just a Firefox Desktop thing. 

Hope this saves you some time! If you can think of other time-saving improvements we could add once to Glean so every Mozilla project can take advantage of, please tell us on Matrix.

If you’re interested in how this is implemented, glean_parser’s part of this is over here, while the mach command part is here.

:chutten

This Week in Glean: Firefox Telemetry is to Glean as C++ is to Rust

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

I had this goofy idea that, like Rust, the Glean SDKs (and Ecosystem) aim to bring safety and higher-level thought to their domain. This is in comparison to how, like C++, Firefox Telemetry is built out of flexible primitives that assume you very much know what you’re doing and cannot (will not?) provide any clues in its design as to how to do things properly.

I have these goofy thoughts a lot. I’m a goofy guy. But the more I thought about it, the more the comparison seemed apt.

In Glean wherever we can we intentionally forbid behaviour we cannot guarantee is safe (e.g. we forbid non-commutative operations in FOG IPC, we forbid decrementing counters). And in situations where we need to permit perhaps-unsafe data practices, we do it in tightly-scoped areas that are identified as unsafe (e.g. if a timing_distribution uses accumulate_raw_samples_nanos you know to look at its data with more skepticism).

In Glean we encourage instrumentors to think at a higher level (e.g. memory_distribution instead of a Histogram of unknown buckets and samples) thereby permitting Glean to identify errors early (e.g. you can’t start a timespan twice) and allowing Glean to do clever things about it (e.g. in our tooling we know counter metrics are interesting when summed, but quantity metrics are not). Speaking of those errors, we are able to forbid error-prone behaviour through design and use of language features (e.g. In languages with type systems we can prevent you from collecting the wrong type of data) and when the error is only detectable at runtime we can report it with a high degree of specificity to make it easier to diagnose.

There are more analogues, but the metaphor gets strained. (( I mean, I guess a timing_distribution’s `TimerId` is kinda the closest thing to a borrow checker we have? Maybe? )) So I should probably stop here.

Now, those of you paying attention might have already seen this relationship. After all, as we all know, glean-core (which underpins most of the Glean SDKs regardless of language) is actually written in Rust whereas Firefox Telemetry’s core of Histograms, Scalars, and Events is written in C++. Maybe we shouldn’t be too surprised when the language the system is written in happens to be reflected in the top-level design.

But! glean-core was (for a long time) written in Kotlin from stem to stern. So maybe it’s not due to language determinism and is more to do with thoughtful design, careful change processes, and a list of principles we hold to firmly as the number of supported languages and metric types continues to grow.

I certainly don’t know. I’m just goofing around.

:chutten

Responsible Data Collection is Good, Actually (Ubisoft Data Summit 2021)

In June I was invited to talk at Ubisoft’s Data Summit about how Mozilla does data. I’ve given a short talk on this subject before, but this was an opportunity to update the material, cover more ground, and include more stories. The talk, including questions, comes in at just under an hour and is probably best summarized by the synopsis:

Learn how responsible data collection as practiced at Mozilla makes cataloguing easy, stops instrumentation mistakes before they ship, and allows you to build self-serve analysis tooling that gets everyone invested in data quality. Oh, and it’s cheaper, too.

If you want to skip to the best bits, I included shameless advertising for Mozilla VPN at 3:20 and becoming a Mozilla contributor at 14:04, and I lose my place in my notes at about 29:30.

Many thanks to Mathieu Nayrolles, Sebastien Hinse and the Data Summit committee at Ubisoft for guiding me through the process and organizing a wonderful event.

:chutten

Distributed Teams: Regional Peculiarities Like Oktoberfest and Bagged Milk

It’s Oktoberfest! You know, that German holiday about beer and lederhosen?

No. As many Germans will tell you it’s not a German thing as much as it is a Bavarian thing. It’s like saying kilts are a British thing (it’s a Scottish thing). Or that milk in bags is a Canadian thing (in Canada it’s an Eastern Canada thing).

In researching what the heck I was talking about when I was making this comparison at a recent team meeting, Alessio found a lovely study on the efficiency of milk bags as milk packaging in Ontario published by The Environment and Plastics Industry Council in 1997.

I highly recommend you skim it for its graphs and the study conclusions. The best parts for me are how it highlights that the consumption of milk (by volume) increased 22% from 1968 to 1995 while at the same time the amount (by mass) of solid waste produced by milk packaging decreased by almost 20%.

I also liked Table 8 which showed the recycling rates of the various packaging types that we’d need to reach in order to match the small amount (by mass) of solid waste generation of the (100% unrecycled) milk bags. (Interestingly, in my region you can recycle milk bags if you first rinse and dry them).

I guess what I’m trying to say about this is three-fold:

  1. Don’t assume regional characteristics are national in your distributed team. Berliners might not look forward to Oktoberfest the way Münchner do, and it’s possible no one in the Vancouver office owns a milk jug or bag cutter.
  2. Milk Bags are kinda neat, and now I feel a little proud about living in a part of the world where they’re common. I’d be a little more confident about this if the data wasn’t presented by the plastics industry, but I’ll take what I can get (and I’ll start recycling my milk bags).
  3. Geez, my team can find data for _any topic_. What differences we have by being distributed around the world are eclipsed by how we’re universally a bunch of nerds.

:chutten

 

My StarCon 2019 Talk: Collecting Data Responsibly and at Scale

 

Back in January I was privileged to speak at StarCon 2019 at the University of Waterloo about responsible data collection. It was a bitterly-cold weekend with beautiful sun dogs ringing the morning sun. I spent it inside talking about good ways to collect data and how Mozilla serves as a concrete example. It’s 15 minutes short and aimed at a general audience. I hope you like it.

I encourage you to also sample some of the other talks. Two I remember fondly are Aaron Levin’s “Conjure ye File System, transmorgifier” about video games that look like file systems and Cory Dominguez’s lovely analysis of Moby Dick editions in “or, the whale“. Since I missed a whole day, I now get to look forward to fondly discovering new ones from the full list.

:chutten

Blast from the Past: I filed a bug against Firefox 3.6.6

A screenshot of the old bugzilla duplicate finder UI with the text inside table cells not rendering at allOn June 30, 2010 I was:

  • Sleepy. My daughter had just been born a few months prior and was spending her time pooping, crying, and not sleeping (as babies do).
  • Busy. I was working at Research in Motion (it would be three years before it would be renamed BlackBerry) on the BlackBerry Browser for BlackBerry 6.0. It was a big shift for us since that release was the first one using WebKit instead of the in-house “mango” rendering engine written in BlackBerry-mobile-dialect Java.
  • Keen. Apparently I was filing a bug against Firefox 3.6.6?!

Yeah. I had completely forgotten about this. Apparently while reading my RSS feeds in Google Reader (that doesn’t make me old, does it?) taking in news from Dragonmount about the Wheel of Time (so I guess I’ve always been a nerd, then) the text would sometimes just fail to render. I even caught it happening on the old Bugzilla “possible duplicate finder” UI (see above).

The only reason I was reminded this exists was because I received bugmail on my personal email address when someone accidentally added and removed themselves from the Cc list.

Pretty sure this bug, being no longer reproducible, still in UNCONFIRMED state, and filed against a pre-rapid-release version Firefox is something I should close. Yeah, I’ll just go and do that.

:chutten

 

The End of Firefox Windows XP Support

Firefox 62 has been released. Go give it a try!

At the same time, on the Extended Support Release channel, we released Firefox ESR 60.2 and stopped supporting Firefox ESR 52: the final version of Firefox with Windows XP support.

Now, we don’t publish all-channel user proportions grouped by operating system, but as part of the Firefox Public Data Report we do have data from the release channel back before we switched our XP users to the ESR channel. At the end of February 2016, XP users made up 12% of release Firefox. By the end of February 2017, XP users made up 8% of release Firefox.

If this trend continued without much change after we switched XP users to ESR, XP Firefox users would presently amount to about 2% of release users.

That’s millions of users we kept safe on the Internet despite running a nearly-17-year-old operating system whose last patch was over 4 years ago. That’s a year and a half of extra support for users who probably don’t feel they have much ability to protect themselves online.

It required effort, and it required devoting resources to supporting XP well after Microsoft stopped doing so. It meant we couldn’t do other things, since we were busy with XP.

I think we did a good thing for these users. I think we did the right thing for these users. And now we’re wishing these users the very best of luck.

…and that they please oh please upgrade so we can go on protecting them into the future.

:chutten

 

Data Science is Hard: Counting Users

Screenshot_2018-08-29 User Activity Firefox Public Data Report

Counting is harder than you think. No, really!

Intuitively, as you look around you, you think this can’t be true. If you see a parking lot you can count the cars, right?

But do cars that have left the parking lot count? What about cars driving through it without stopping? What about cars driving through looking for a space? (And can you tell the difference between those two kinds from a distance?)

These cars all count if you’re interested in usage. It’s all well and good to know the number of cars using your parking lot right now… but is it lower on weekends? Holidays? Are you measuring on a rainy day when fewer people take bicycles, or in the Summer when more people are on vacation? Do you need better signs or more amenities to get more drivers to stop? Are you going to have expand capacity this year, or next?

Yesterday we released the Firefox Public Data Report. Go take a look! It is the culmination of months of work of many mozillians (not me, I only contributed some early bug reports). In it you can find out how many users Firefox has, the most popular addons, and how quickly Firefox users update to the latest version. And you can choose whether to look at how these plots look for the worldwide user base or for one of the top ten (by number of Firefox users) countries individually.

It’s really cool.

The first two plots are a little strange, though. They count the number of Firefox users over time… and they don’t agree. They don’t even come close!

For the week including August 17, 2018 the Yearly Active User (YAU) count is 861884770 (or about 862M)… but the Monthly Active User (MAU) count is 256092920 (or about 256M)!

That’s over 600M difference! Which one is right?

Well, they both are.

Returning to our parking lot analogy, MAU is about counting how many cars use the parking lot over a 28-day period. So, starting Feb 1, count cars. If someone you saw earlier returns the next day or after a week, don’t count them again: we only want unique cars. Then, at the end of the 28-day period, that was the MAU for Feb 28. The MAU for Mar 1 (on non-leap-years) is the same thing, but you start counting on Feb 2.

Similarly for YAU, but you count over the past 365 days.

It stands to reason that you’ll see more unique cars over the year than you will over the month: you’ll see visitors, tourists, people using the lot just once, and people who have changed jobs and haven’t been back in four months.

So how many of these 600M who are in the YAU but not in the MAU are gone forever? How many are coming back? We don’t know.

Well, we don’t know _precisely_.

We’ve been at the browser game for long enough to see patterns in the data. We’re in the Summer slump for MAU numbers, and we have a model for how much higher the numbers are likely to be come October. We have surveyed people of varied backgrounds and have some ideas of why people change browsers to or away from Firefox.

We have the no-longer users, the lapsed users, the lost-and-regained users, the tried-us-once users, the non-human users, … we have categories and rough proportions on what we think we know about our population, and how that influences how we can better make the internet better for them.

Ultimately, to me, it doesn’t matter too much. I work on Firefox, a product that hundreds of millions of people use. How many hundreds of millions doesn’t matter: we’re above the threshold that makes me feel like I’m making the world better.

(( Well… I say that, but it is actually my job to understand the mechanisms behind these  numbers and why they can’t be exact, so I do have a bit of a vested interest. And there are a myriad of technological and behavioural considerations to account for in code and in documentation and in analysis which makes it an interesting job. But, you know. Hundreds of millions is precise enough for my job satisfaction index. ))

But once again we reach the inescapable return to the central thesis. Counting is harder than you think: one of the leading candidates for the Data Team’s motto. (Others include “Well, it depends.” and “¯\_(ツ)_/¯”). And now we’re counting in the open, so you get to experience its difficulty firsthand. Go have another look.

:chutten

 

Faster Event Telemetry with “event” Pings

Screenshot_2018-07-04 New Query(1).pngEvent Telemetry is the means by which we can send ordered interaction data from Firefox users back to Mozilla where we can use it to make product decisions.

For example, we know from a histogram that the most popular way of opening the Developer Tools in Firefox Beta 62 is by the shortcut key (Ctrl+Shift+I). And it’s nice to see that the number of times the Javascript Debugger was opened was roughly 1/10th of the number of times the shortcut key was used.

…but are these connected? If so, how?

And the Javascript Profiler is opened only half as often as the Debugger. Why? Isn’t it easy to find that panel from the Debugger? Are users going there directly from the DOM view or is it easier to find from the Debugger?

To determine what parts of Firefox our users are having trouble finding or using, we often need to know the order things happen. That’s where Event Telemetry comes into play: we timestamp things that happen all over the browser so we can see what happens and in what order (and a little bit of how long it took to happen).

Event Telemetry isn’t new: it’s been around for about 2 years now. And for those two years it has been piggy-backing on the workhorse of the Firefox Telemetry system: the “main” ping.

The “main” ping carries a lot of information and is usually sent once per time you close your Firefox (or once per day, whichever is shorter). As such, Event Telemetry was constrained in how it was able to report this ordered data. It takes two whole days to get 95% of it (because that’s how long it takes us to get “main” pings), and it isn’t allowed to send more than one thousand events per process (lest it balloon the size of the “main” ping, causing problems).

This makes the data slow, and possibly incomplete.

With the landing of bug 1460595 in Firefox Nightly 63 last week, Event Telemetry now has its own ping: the “event” ping.

The “event” ping maintains the same 1000-events-per-process-per-ping limit as the “main” ping, but can send pings as frequently as one ping every ten minutes. Typically, though, it waits the full hour before sending as there isn’t any rush. A maximum delay of an hour still makes for low-latency data, and a minimum delay of ten minutes is unlikely to be overrun by event recordings which means we should get all of the events.

This means it takes less time to receive data that is more likely to be complete. This in turn means we can use less of it to get our answers. And it means more efficiency in our decision-making process, which is important when you’re competing against giants.

If you use Event Telemetry to answer your questions with data, now you can look forward to being able to do so faster and with less worry about losing data along the way.

And if you don’t use Event Telemetry to answer your questions, maybe now would be a good time to start.

The “event” ping landed in Firefox Nightly 63 (build id 20180627100027) and I hope to have it uplifted to Firefox Beta 62 in the coming days.

Thanks to :sunahsuh for her excellent work reviewing the proposal and in getting the data into the derived datasets so they can be easily queried, and further thanks to the Data Team for their support.

:chutten

Mozilla All-Hands Tips

25486648678_90fa78a27e_k
All Hands Austin, December 2017, Mitchell Baker presenting. (Photo used under CC BY-NC-SA 2.0)

Twice a year, Mozilla gathers employees, volunteers, and assorted hangers-on in a single place to have a week of planning, working, and socializing. Being as distributed an organization as we are, it’s a bit rare to get enough of us in a single place to generate the kind of cross-talk and beneficial synergistic happenstances that help us work smarter and move in more-or-less the same direction. These are our All Hands events.

They’re a Pretty Big Deal(tm).

So here you are, individual contributor or manager, staff or volunteer, veteran or first-timer. With all these Big Plans, what are we littler folk to do to not become overwhelmed?

I have some tips.

Before You Go

Set up a mail folder/label for relevant email: You’ll be getting some email with details about where you should be, what you should be doing, and when. Organizing these into one place is helpful for reference, so come up with a label (maybe “201807-sanfran” or “mozsf2” or “fogzilla” or something) and organize those emails as they come in.

Act on those emails immediately: If they contain instructions or an announcement that bookings or registration is now open… then do that thing right then. Do not file the email and forget. Do the thing while you are looking at that email. Only then should you file that email and get back to where you were in your brain. If you absolutely can’t just then (have to synchronize with family or what-have-you), put a calendar reminder in that repeats every weekday until you handle it.

Do not upgrade Nightly: You’re running Nightly, right? You’ll be travelling through a land of uncertain connectivity, and the last thing you want is to use it downloading a multi-MB Nightly update that might have accidentally disabled Captive Portal Detection. If it works, keep your Nightly build until you’re certain you have the bandwidth to download a new one. All else fails, keep it until you get back.

Make sure your laptop is in shape: My laptop is often neglected in favour of its Desktop comrade: updates may be pending, credentials may have expired, the source code checkouts might be weeks old, and there may have even been a new version of Mozilla Build released since the last time I tried to compile Firefox. With luck, while at an All Hands you won’t have to compile Gecko on a laptop in your hotel… but we make our own luck, we who are prepared. Prepare your laptop.

Prepare your family: If you don’t live alone, you’ll have non-mozilla prepwork to do. Spouse and kids or roomates and pets, there are lifeforms who normally expect to see you that won’t. Clear the family schedule for the week you’re gone, and do as much preparation ahead of time as you can. Laundry, meal planning, groceries, sitters, dog walkers, even lawn services are things you can arrange to lighten the load that your absence will place on those around you. Even if you’re bringing them with you.

While You’re There

Do not fear missing out: You will not be able to attend both Boardgame Night and your team dinner. There will be karaoke parties you won’t get to, or be invited to. This is fine. This is expected. This is unavoidable when you have so many people disorganizing so many things simultaneously. So don’t fret about it. Prioritize.

Say no: Speaking of prioritizing: prioritize for yourself. You may very well be operating as a Level 100 You for hours at a time. So many people to talk to, so many talks and social events to organize, deliver, and attend… No. You don’t have to stay the entire length of the party. You don’t even have to go. If you feel yourself fading, get out while you have the strength. Regroup. Find a quiet corner or go to sleep early… At my first All Hands, I napped on both Wednesday and Thursday. And I wasn’t even in a different timezone. It really helped.

Wash your hands: Lots. Before meals. After meals. You’ll be talking, working, eating, and otherwise hanging out with a thousand of your closest coworkers. It’s probably your best bet for not catching mozflu, and it’s definitely your best bet to not transmit it.

After You’re Back

Consider taking a day: Generally speaking you’ll be flying back on Saturday and returning to work on Monday. Depending on distance to travel, available flight times, and cancellations, this may result in only a few hours between stumbling through your door and stumbling back to work. Consider booking that Monday off (or, honestly, if your trip back was heinous, don’t even book it off. Just take it. Get some sleep. Work can wait until Tuesday.)

Check in: If you live with family, you haven’t seen them for a week. Even if you brought them with you, you’ve been in meetings and talks and stuff most hours. Check in with them. Get up to speed on what’s been happening in their lives while you’ve been away.

Get excited for the next one: Even immediately back from an All Hands, it’s still only six months to the next one. Take stock of what you liked and what you didn’t like about this one. Rest up, and try not to get impatient :)

:chutten

(( Great minds think alike, because Seburo recently wrote a Wiki article covering even more excellent tips for All Hands events. Check that out, too! ))