This Week in Glean: How Long Must I Wait Before I Can See My Data?

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. All “This Week in Glean” blog posts are listed in the TWiG index).

You’ve heard about this cool Firefox on Glean thing and wish to instrument a part of Firefox Desktop. So you add a metrics.yaml definition for a new Glean metric, commit a piece of instrumentation to mozilla-central, and then Lando lands it for you.

When can you expect the data collected when users’ Firefoxes trigger that instrumentation to show up in a queryable place like BigQuery?

The answer is one of the more common phrases I say when it comes to data: Well, it depends.

In the broadest sense, we’re looking at two days:

1) A new Nightly build will be generated within 12h of your commit.

2) Users will pick up the new Nightly build fairly quickly after that, and start triggering the instrumentation.

3) The following 4am a “metrics” ping containing data from your instrumentation will be submitted (or some time later if Firefox isn’t running at 4am)

4) A new schema generated to include your new metric definition will have been deployed overnight

5) The following 12am UTC a new partition of our user-facing stable views will have the previous day’s submissions available.

And then you commence querying! Easy as that.

Any questions?

The Questions:

What if I added a new metrics.yaml file?

That file needs to land in gecko-dev (the github mirror of mozilla-central) first. Only then can we (and by “we” I here mean the Data Team, by means of a bug you file) update the data pipeline. Then you get to wait until the next weekday’s midnight UTC (ish) for a schema deploy as per Step 4.

Generally this doesn’t add too much delay, but if landing the file happens after the pipeline folks have gone home, we get to wait until the next weekday’s midnight UTC.

The Nightly population is small and weird. How long until we get data from release?

Uptake of code to release takes a while longer. Code lands in mozilla-central, and gets into the next Nightly within 12h. Getting to Beta from Nightly means waiting until the next merge day (these days that’s on the first Monday of the month, or thereabouts). Getting to Release from Beta means waiting until the merge day after that.

If you’re unlucky, you’ll be waiting over two months for your instrumentation to be in a Release build that users can pull down.

And then you get to wait for enough Release users to update that you’re getting a representative sample. (This could take a week or so.)

So… nine weeks?

That sounds really bad! Is there anything we can do?

Why yes.

The first thing we can do is adjust our expectations. There’s a four-week sway from the worst-case to best-case on this slow path. It isn’t likely that you’ll always be landing instrumentation immediately after a merge day and get to wait the whole month until Nightly merges to Beta.

Your average wait for that is only two weeks. And the best case is a matter of a day or two.

So cross your fingers, and hope your luck is with you.

Secondly, instrumentation is (by itself) very low-risk, so you can “uplift” the instrumentation change directly to Beta without waiting for merge day.

This can cut your route to release down to _two weeks_, by (e.g.) landing in Nightly on Monday Nov 22, verifying that it works on Tuesday, requesting uplift on Wednesday, getting uplifted in the last Beta on Thursday Nov 25, then making the merge from Beta to Release on Dec 6.

(You do still get to wait a third week for the release population to update to the latest version.)

Thirdly, what are the chances that your instrumentation is measuring a feature you just built or just turned on? You want that feature to benefit from the slow-roll exposure to the more tolerant audiences of Nightly and Beta before it reaches Release, right? Automated testing is great, but nothing can simulate the wild variety of use cases and hardware combinations your feature will experience in the Real World.

So what point is there getting your instrumentation into Release before the feature under instrumentation reaches it? Instead of measuring the interval between landing instrumentation and beginning analysis, perhaps measure the interval between the release of the feature you wish to instrument and beginning analysis?

That interval is only a day: gotta wait for that partition in the stable view. Sounds much better, doesn’t it?

Still, can I get data any faster?

The fastest time from Point A) Landing a metric, to Point B) Performing preliminary analysis on a metric, is about 12h:

1) Land your code just before a new Nightly is cut.

2) Hope that the number of Nightly users that update to the latest build over the next twelve hours is enough for your purposes.

If you didn’t luck out and have a schema deploy, you’ll need to dig your data out of the additional_properties JSON column. If you are lucky, you can use the friendly columns instead.

To get to the data before the nightly copy-deduplicate to stable views, you’ll be querying the live tables instead. You need to fully-qualify that table name. You need to realize that we haven’t deduped anything in here. And you need to take narrow slices, because we can’t cluster the data effectively here, so querying can get expensive, fast.

Can I get data that quickly from release?

Not yet.

I’ve seen a proposal internally for dynamically-defined metrics which get pushed to running Firefox instances (talk to :esmyth if you’re interested). Though its present form is proposing the process and possibility, not the technology, there’s a version of this I can see that would (for a subset of data collection) take the time from “I wish to instrument this” to “I am performing analysis on data received from (a subset of) the release Firefox population” down to within a business day.

Which is neat! But that speed brings risk, so it’ll take a while to design a system that doesn’t expose our users to that risk.

Don’t expect this for Christmas, is I guess what I mean : )

:chutten

Adventures in Water Softening

I took some time off recently and, as I’m too foolish to allow myself to spend my time on worthwhile rest activities like reading, watching TV, or playing video games, I worked my way through a self-assembled list of “Things I Never Have Enough Time To Deal With”.

One was replacing the Moen cartridge in the upstairs bath since it appears to be the cause of a (excruciatingly-)slow drip. A visit to Lowe’s later, I had a no-charge replacement cartridge in-hand. Lifetime warranties plus customer service: nice.

As with most baths about which I’ve had the misfortune to learn their plumbing, there are no fixture-side shutoff valves so I waited until I had the house to myself and turned off the water to the whole house. (Thank goodness the most recent visit by a plumber included replacing the ancient screw valve with a 90deg valve. So much nicer to work with).

Alas, the Moen cart was the wrong size (I think I need the 1225, the helpful person at Lowe’s presumed it was a 1222b), so no joy there. Thus I turned the water back on. This was about a quarter past four in the afternoon.

Next morning around 9am my wife and I detect an intermittent beeping. Never a good sign.

We check the freezer, fridge, garage, laundry room, dehumidifier… nothing. But it’s coming from the basement.

Good news! The emergency “there’s water on my basement floor” alarm works.

Bad news! There’s water on my basement floor.

A slow (but quickening) leak had developed on my water softener to do with the below-illustrated parts. We have the large assembly which I call the bypass assembly (it connects to the softener via the two horizontal tubes), and two identical adaptors which adapt from the house’s copper (at least in my case) piping to the top two sockets of the assembly. Inlet’s on the right, outlet’s on the left.

A large plastic assembly, four retaining clips, and two adaptor tubes all in plastic, for connecting a water softener to a house water system

The leak was coming from the outlet socket between it and the adaptor. Oh no, I thought, there’s a crack in this large custom-made piece of plastic. And since the large piece of plastic is the bypass for the softener, the usual path for bypassing the fault for diagnosis and repair is no good. The leak doesn’t care whether the bypass assembly is set to Service or set to Bypass, so the bypass cannot bypass the leak.

Luckily, the previous softener didn’t have a single-valve bypass and so had a three-valve bypass in the inlet, outlet, and bridging copper. Open the bridge, close the outlet, close the inlet, good to go. (I’m not sure if that’s the correct order, but it seemed to work).

Diagram of a 3-valve plumbing bypass system

Unfortunately all these valves are screw valves and are decades old, likely not having been used in as long, so they leak when not fully closed or fully open and were stiff as heck to get moving. I’ll need to have those replaced at some point, but then we should also be looking into probably rearranging the whole utility room because the plumbing (gas and water and coolant) is a mess. (Ah, the joys of home ownership. The only thing worse is anything else.)

(( I’d usually include a digression here about water softening and why it’s so dang important in my part of the world. I’ll just leave you with this Wikipedia link on water softening for the former, and this map of water hardness in the Region of Waterloo (plus this link to the USGS saying that anything over 180 mg/L (of CaCO3) is “Very Hard”, which translates to anything over 10.57gpg. Note the map starts at 17gpg and goes up from there.) for the latter. Conclusions are left to the reader. ))

Clearly I was going to have to take it apart to see what was going on.

Unfortunately, a fluid-filled closed system like that is subject to certain pressures that made absolutely everything to do with this job an absolute trial. Just getting the pieces apart involved 1) removing the retaining clips (easy), then 2) Separating the O-ring-having adaptor tubes from the bypass assembly (difficult). I _think_ I had to overcome the resistance to vacuum in the pipes to force the first one apart, which of course dislodged the second one and they both dumped their contents exactly adjacent to the bucket I had placed. Water alarm went off again. I put it on a shelf.

My luck seemed to turn, though, as a visual inspection of the assembly and adaptors showed no sign of splits, tears, wear (it’s only been in place for 3 years (installed March 2018)), or other damage. The inlet socket was lousy with rust, but not only was the outlet socket intact, it was clean.

So I put it all back together and reopened the valves: close the bridge, open the outlet, open the inlet. There was some backwash into the softener I didn’t like by doing it this way, and it introduced a lot of air that would make itself explosively known at every fixture throughout the house (almost blew the lid off the upstairs toilet. How?), but it all came together.

And then the inlet socket on the bypass valve began to leak.

Le sigh.

Turn it all back off again: outlet closed, bridge open, inlet closed.

This time I was prepared for the pressure differential and the location of the bucket when I pulled the inlet pipe out of the bypass. What I didn’t account for was the outlet pipe’s water backwashing through the bypass and bubbling out of the inlet socket. Note to self: If you leave the bypass on “Service” the softener will resist the flow for you.

Again, no damage or wear on the inlet, but there was still a smear of rust. I cleaned that out and reseated the adaptor.

Checking a hunch, I noticed that the retaining clips were not bilaterally symmetric. They had an up and a down. So I replaced the clip with the up side up, and opened the inlet valve of the three-valve bypass.

Turns out you can create a pressure bomb if you allow mains pressure to push an air bubble against a closed valve all of a sudden. The outlet adaptor popped out of the outlet socket with a bang. Everything got wet (including the erstwhile plumber penning these words). It was only luck that I hadn’t seated the retaining clips sufficiently and so the pieces only came apart and didn’t actually break.

It was exciting in exactly the wrong sorts of way.

But it gave me an inkling that maybe by being indelicate about closing and opening the mains shutoff for the Moen cartridge replacement resulted in some water hammer that spread the softener’s outlet adaptor apart from the socket allowing a slow leak to begin. It doesn’t really make sense, since there’s the softener in the way which would dampen such effects, but I’m at a loss for understanding the leak at all, let alone why it happened then.

Anyway, there’s full supply pressure pouring on your floor, you can think later. Switch the three-valve bypass back to bypass, reseat the pieces, ensure the retaining clips are the right way up, dry everything off so you can see leaks if they happen. Good? Good. Let’s try again. Close the bridge slowly, open the outlet slowly, open the inlet slowly, and run a downstream faucet to try and release the captured air.

And the leak mysteriously disappeared without anything having been repaired or replaced, just disassembled (one time forceably) and reassembled.

Still had pops and booms from every fixture and faucet in the house as they were used the first time after the “fix”, but otherwise everything is (so far) okay. I put the emergency “there’s water on my basement floor” alarm back on the floor.

This serves as record of what happened and what I did. May it help you and future me should anything like this happen again.

This Week in Glean: The Three Roles of Data Engagements

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.) All “This Week in Glean” blog posts are listed in the TWiG index).

I’ve just recently started my sixth year working at Mozilla on data and data-adjacent things. In those years I’ve started to notice some patterns in how data is approached, so I thought I’d set them down in a TWiG because Glean’s got a role to play in them.

Data Engagements

A Data Engagement is when there’s a question that needs to engage with data to be answered. Something like “How many bookmarks are used by Firefox users?”.

(No one calls these Data Engagements but me, and I only do because I need to call them _something_.)

I’ve noticed three roles in Data Engagements at Mozilla:

  1. Data Consumer: The Question-Asker. The Temperature-Taker. This is the one who knows what questions are important, and is frustrated without an answer until and unless data can be collected and analysed to provide it. “We need to know how many bookmarks are used to see if we should invest more in bookmark R&D.”
  2. Data Analyst: The Answer-Maker. The Stats-Cruncher. This is the one who can use Data to answer a Consumer’s Question. “Bookmarks are used by Canadians more than Mexicans most of the time, but only amongst profiles that have at least one bookmark.”
  3. Data Instrumentor: The Data-Digger. The Code-Implementor. This one can sift through product code and find the correct place to collect the right piece of data. “The Places database holds many things, we’ll need to filter for just bookmarks to count them.”

(diagrams courtesy of :brizental)

It’s through these three working in concert — The Consumer having a question that the Instrumentor instruments to generate data the Analyst can analyse to return an answer back to the Consumer — that a Data Engagement succeeds.

At Mozilla, Data Engagements succeed very frequently in certain circumstances. The Graphics team answers many deeply-technical questions about Firefox running in the wild to determine how well WebRender is working. The Telemetry team examines the health of the data collection system as a whole. Mike Conley’s old Tab Switcher Dashboard helped find and solve performance regressions in (unsurprisingly) Tab Switching. These go well, and there’s a common thread here that I think is the secret of why: 

In these and the other high-success-rate Data Engagements, all three roles (Consumer, Analyst, and Instrumentor) are embodied by the same person.

It’s a common problem in the industry. It’s hard to build anything at all, but it’s least hard to build something for yourself. When you are in yourself the Question-Asker, Answer-Maker, and Data-Digger, you don’t often mistakenly dig the wrong data to create an answer that isn’t to the question you had in mind. And when you accidentally do make a mistake (because, remember, this is hard), you can go back in and change the instrumentation, update the analysis, or reword the question.

But when these three roles are in different parts of the org, or different parts of the planet, things get harder. Each role is now trying to speak the others’ languages and infer enough context to do their jobs independently.

In comes the Data Org at Mozilla which has had great successes to date on the theme of “Making it easier for anyone to be their own Analyst”. Data Democratization. When you’re your own Analyst, then there’s fewer situations when the roles are disparate: Instrumentors who are their own Analysts know when data won’t be the right shape to answer their own questions and Consumers who are their own Analysts know when their questions aren’t well-formed.

Unfortunately we haven’t had as much success in making the other roles more accessible. Everyone can theoretically be their own Consumer: curiosity in a data-rich environment is as common as lanyards at an industry conference[1]. Asking _good_ questions is hard, though. Possible, but hard. You could just about imagine someone in a mature data organization becoming able to tell the difference between questions that are important and questions that are just interesting through self-serve tooling and documentation.

As for being your own Instrumentor… that is something that only a small fraction of folks have the patience to do. I (and Mozilla’s Community Managers) welcome you to try: it is possible to download and build Firefox yourself. It’s possible to find out which part of the codebase controls which pieces of UI. It’s… well, it’s more than possible, it’s actually quite pleasant to add instrumentation using Glean… but on the whole, if you are someone who _can_ Instrument Firefox Desktop you probably already have a copy of the source code on your hard drive. If you check right now and it’s not there, then there’s precious little likelihood that will change.

(Unless you come and work for Mozilla, that is.)

So let’s assume for now that democratizing instrumentation is impossible. Why does it matter? Why should it matter that the Consumer is a separate person from the Instrumentor?

Communication

Each role communicates with each other role with a different language:

  • Consumers talk to Instrumentors and Analysts in units of Questions and Answers. “How many bookmarks are there? We need to know whether people are using bookmarks.”
  • Analysts speak Data, Metadata, and Stats. “The median number of bookmarks is, according to a representative sample of Firefox profiles, twelve (confidence interval 99.5%).”
  • Instrumentors speak Data and Code. “There’s a few ways we delete bookmarks, we should cover them all to make sure the count’s correct when the next ping’s sent”

Some more of the Data Org and Mozilla’s greatest successes involve supplying context at the points in a Data Engagement where they’re most needed. We’ve gotten exceedingly good at loading context about data (metadata) to facilitate communication between Instrumentors and Analysts with tools like Glean Dictionary.

Ah, but once again the weak link appears to be the communication of Questions and Answers between Consumers and Instrumentors. Taking the above example, does the number of bookmarks include folders?

The Consumer knows, but the further away they sit from the Instrumentor, the less likely that the data coming from the product and fueling the analysis will be the “correct” one.

(Either including or excluding folders would be “correct” for different cases. Which one do you think was “more correct”?)

So how do we improve this?

Glean

Well, actually, Glean doesn’t have a solution for this. I don’t actually know what the solutions are. I have some ideas. Maybe we should share more context between Consumers and Instrumentors somehow. Maybe we should formalize the act of question-asking. Maybe we should build into the Glean SDK a high-enough level of metric abstraction that instead of asking questions, Consumers learn to speak a language of metrics.

The one thing I do know is that Glean is absolutely necessary to making any of these solutions possible. Without Glean, we have too many systems that are fractally complex for any context to be relevantly shared. How can we talk about sharing context about bookmark counts when we aren’t even counting things consistently[2]?

Glean brings that consistency. And from there we get to start solving these problems.

Expect me to come back to this realm of Engagements and the Three Roles in future posts. I’ve been thinking about:

  • how tooling affects the languages the roles speak amongst themselves and between each other,
  • how the roles are distributed on the org chart,
  • which teams support each role,
  • how Data Stewardship makes communication easier by adding context and formality,
  • how Telemetry and Glean handle the same situations in different ways, and
  • what roles Users play in all this. No model about data is complete without considering where the data comes from.

I’m not sure how many I’ll actually get to, but at least I have ideas.

:chutten

[1] Other rejected similes include “as common as”: maple syrup on Canadian breakfast tables, frustration in traffic, sense isn’t.

[2] Counting is harder than it looks.

Six-Year Moziversary

I’ve been working at Mozilla for six years today. Wow.

Okay, so what’s happened… I’ve been promoted to Staff Software Engineer. Georg and I’d been working on that before he left, and then, well *gestures at everything*. This means it doesn’t really _feel_ that different to be a Staff instead of a Senior since I’ve been operating at the latter level for over a year now, but the it’s nice that the title caught up. Next stop: well, actually, I think Staff’s a good place for now.

Firefox On Glean did indeed take my entire 2020 at work, and did complete on time and on budget. Glean is now available to be used in Firefox Desktop.

My efforts towards getting folks to actually _use_ Glean instead of Firefox Telemetry in Firefox Desktop have been mixed. The Background Update Task work went exceedingly well… but when there’s 2k pieces of instrumentation, you need project management and I’m trying my best. Now to “just” get buy-in from the powers that be.

I delivered a talk to Ubisoft (yeah, the video game folks) earlier this year. That was a blast and I’m low-key looking for another opportunity like it. If you know anyone who’d like me to talk their ears off about Data and Responsibility, do let me know.

Blogging’s still low-frequency. I rely on the This Week in Glean rotation to give me the kick to actually write long-form ideas down from time-to-time… but it’s infrequent. Look forward to an upcoming blog post about the Three Roles in Data Engagements.

Predictions for the future time:

  • There will be at least one Work Week planned if not executed by this time next year. Vaccines work.
  • Firefox Desktop will have at least started migrating its instrumentation to Glean.
  • I will still be spending a good chunk of my time coding, though I expect this trend of spending ever more time writing proposals and helping folks on chat will continue.

And that’s it for me for now.

:chutten

This Week in Glean: Data Reviews are Important, Glean Parser makes them Easy

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.) All “This Week in Glean” blog posts are listed in the TWiG index).

At Mozilla we put a lot of stock in Openness. Source? Open. Bug tracker? Open. Discussion Forums (Fora?)? Open (synchronous and asynchronous).

We also have an open process for determining if a new or expanded data collection in a Mozilla project is in line with our Privacy Principles and Policies: Data Review.

Basically, when a new piece of instrumentation is put up for code review (or before, or after), the instrumentor fills out a form and asks a volunteer Data Steward to review it. If the instrumentation (as explained in the filled-in form) is obviously in line with our privacy commitments to our users, the Data Steward gives it the go-ahead to ship.

(If it isn’t _obviously_ okay then we kick it up to our Trust Team to make the decision. They sit next to Legal, in case you need to find them.)

The Data Review Process and its forms are very generic. They’re designed to work for any instrumentation (tab count, bytes transferred, theme colour) being added to any project (Firefox Desktop, mozilla.org, Focus) and being collected by any data collection system (Firefox Telemetry, Crash Reporter, Glean). This is great for the process as it means we can use it and rely on it anywhere.

It isn’t so great for users _of_ the process. If you only ever write Data Reviews for one system, you’ll find yourself answering the same questions with the same answers every time.

And Glean makes this worse (better?) by including in its metrics definitions almost every piece of information you need in order to answer the review. So now you get to write the answers first in YAML and then in English during Data Review.

But no more! Introducing glean_parser data-review and mach data-review: command-line tools that will generate for you a Data Review Request skeleton with all the easy parts filled in. It works like this:

  1. Write your instrumentation, providing full information in the metrics definition.
  2. Call python -m glean_parser data-review <bug_number> <list of metrics.yaml files> (or mach data-review <bug_number> if you’re adding the instrumentation to Firefox Desktop).
  3. glean_parser will parse the metrics definitions files, pull out only the definitions that were added or changed in <bug_number>, and then output a partially-filled-out form for you.

Here’s an example. Say I’m working on bug 1664461 and add a new piece of instrumentation to Firefox Desktop:

fog.ipc:
  replay_failures:
    type: counter
    description: |
      The number of times the ipc buffer failed to be replayed in the
      parent process.
    bugs:
      - https://bugzilla.mozilla.org/show_bug.cgi?id=1664461
    data_reviews:
      - https://bugzilla.mozilla.org/show_bug.cgi?id=1664461
    data_sensitivity:
      - technical
    notification_emails:
      - chutten@mozilla.com
      - glean-team@mozilla.com
    expires: never

I’m sure to fill in the `bugs` field correctly (because that’s important on its own _and_ it’s what glean_parser data-review uses to find which data I added), and have categorized the data_sensitivity. I also included a helpful description. (The data_reviews field currently points at the bug I’ll attach the Data Review Request for. I’d better remember to come back before I land this code and update it to point at the specific comment…)

Then I can simply use mach data-review 1664461 and it spits out:

!! Reminder: it is your responsibility to complete and check the correctness of
!! this automatically-generated request skeleton before requesting Data
!! Collection Review. See https://wiki.mozilla.org/Data_Collection for details.

DATA REVIEW REQUEST
1. What questions will you answer with this data?

TODO: Fill this in.

2. Why does Mozilla need to answer these questions? Are there benefits for users?
   Do we need this information to address product or business requirements?

TODO: Fill this in.

3. What alternative methods did you consider to answer these questions?
   Why were they not sufficient?

TODO: Fill this in.

4. Can current instrumentation answer these questions?

TODO: Fill this in.

5. List all proposed measurements and indicate the category of data collection for each
   measurement, using the Firefox data collection categories found on the Mozilla wiki.

Measurement Name | Measurement Description | Data Collection Category | Tracking Bug
---------------- | ----------------------- | ------------------------ | ------------
fog_ipc.replay_failures | The number of times the ipc buffer failed to be replayed in the parent process.  | technical | https://bugzilla.mozilla.org/show_bug.cgi?id=1664461


6. Please provide a link to the documentation for this data collection which
   describes the ultimate data set in a public, complete, and accurate way.

This collection is Glean so is documented
[in the Glean Dictionary](https://dictionary.telemetry.mozilla.org).

7. How long will this data be collected?

This collection will be collected permanently.
**TODO: identify at least one individual here** will be responsible for the permanent collections.

8. What populations will you measure?

All channels, countries, and locales. No filters.

9. If this data collection is default on, what is the opt-out mechanism for users?

These collections are Glean. The opt-out can be found in the product's preferences.

10. Please provide a general description of how you will analyze this data.

TODO: Fill this in.

11. Where do you intend to share the results of your analysis?

TODO: Fill this in.

12. Is there a third-party tool (i.e. not Telemetry) that you
    are proposing to use for this data collection?

No.

As you can see, this Data Review Request skeleton comes partially filled out. Everything you previously had to mechanically fill out has been done for you, leaving you more time to focus on only the interesting questions like “Why do we need this?” and “How are you going to use it?”.

Also, this saves you from having to remember the URL to the Data Review Request Form Template each time you need it. We’ve got you covered.

And since this is part of Glean, this means this is already available to every project you can see here. This isn’t just a Firefox Desktop thing. 

Hope this saves you some time! If you can think of other time-saving improvements we could add once to Glean so every Mozilla project can take advantage of, please tell us on Matrix.

If you’re interested in how this is implemented, glean_parser’s part of this is over here, while the mach command part is here.

:chutten

I Assembled A Home Audio Thingy

Or, how to use Volumio, an old Raspberry Pi B+ (from 2014!), and an even older Denon stereo receiver+amplifier to pipe my wife’s MP3 collection to wired speakers in my house.

We like ourselves some music in our house. We’re not Hi Fi snobs. We don’t follow bands, really. We just like to have tunes around to help make chores a little less dreary <small>and to fill the gaping void we all hide inside ourselves</small>. Up until getting this house of ours a half decade ago we accomplished this by turning our computer speakers or CD player up to Rather Loud and trying not to spend too much time too close to it.

This “new” house came with a set of speakers in the kitchen and a nest of speaker wires connecting various corners of the main floor to a central location via the drop ceiling in the basement. With a couple of shelf speakers I ripped the proprietary connectors off of, plus two more speakers and a receiver donated by a far-more Hi Fi snobbish (though not really. But he does rather care about the surround, and waxes poetic about Master and Commander and House of the Flying Daggers for their sound fields) friend of ours, I had six speakers in four rooms.

But I had nothing to play on it. No audio source.

For fun I hooked up the PS4 via toslink/spdif/that optical thingy so I could play Uncharted in surround… but it seems Sony’s dream of the PlayStation being the command center of your home entertainment centre never really got off the ground as it can’t even play one of our (many) audio CDs.

(For the youngins: An audio CD is like a Spotify Playlist that is at most an hour long, but doesn’t require an Internet connection to play).

The PS3 was closer to that vision and had the hardware to play CDs, so it got unmothballed and used as a CD Player? Disc Deck? An audio source that did nothing but play audio CDs. The receiver had a 5CH Stereo setting so we had left+right channels in the rooms that had multiple speakers (and the two that only had single speakers I threw on L because Mono)…

Suffice to say we had a “okay” setup, given I spent a grand total of zero dollabux on it.

But my wife and I? We have MP3 collections that far outstrip our CD collections.

(For the youngins: An MP3 is like a stream of audio that you don’t need the Internet to play.)

(I’m ignoring the cassette tape collection, which play only in the basement on the Hi Fi Enthusiast Hardware of the Late Eighties that the previous owners of the house didn’t deign to take with them. It’s delightful.) How was I going to hook those MP3s up so they could play through the house as easily as the Audio CDs?

For a while I tried to get it to work via the Home Theatre PC.

(For the youngins: A Home Theatre PC is a computer which you connect to a TV so you can do computer things on your TV. Like a Smart TV in two pieces, both of which I control. Or like a laptop, but with a much larger screen that has a remote control.)

Unfortunately the HTPC’s dock is acting up when it comes to audio, and even the headphone jack was giving me grief. Plus, the HTPC’s media software stack was based on Kodi which, though lovely and has remote control capabilities over local network via both their web interface Chorus2 and official app Kore, is far more interested in video than audio. (for example: playlists don’t exist in Kore, and can’t really be managed in Chorus2).

But I learned a lot about what I wanted from such a system in trying to squish it into the HTPC which already had a job to do, so I decided to try making the audio player its own thing. Do one thing and do it well, jacks of all trades are masters of none. That sort of thing.

That’s when I remembered I had an old Raspberry Pi B+ in my closet. 700MHz CPU. 512MB RAM. Not the fastest machine in the park… but all it had to do was supply an interface in front of a largish (8k tracks) MP3 collection.

I found this project called Volumio which aimed to catalogue and provide a good, network-aware frontend on an audio collection (and do other stuff). It even had a plugin for playing Audio CDs so I could finally return the PS3 to game playing duty in the basement with the other previous generations of video gaming hardware.

It was a bit fiddly, though. Here’s the process:

  1. Install stock Volumio onto a microSD card which you then insert into the Raspberry Pi
    • This was very straightforward except for when I learned that the microsd card I wanted to use actually had bad-sector-ed itself to unusability. Luckily I had a spare.
  2. Adjust Volumio’s settings
    • Be sure to change playback to “Single” from “Continuous” or when you press play on a single track in a long list it’ll add every track in that list to the queue… which, on the B+’s anemic processor, takes a goodly while.
  3. Install the NanoSound CD Plugin
    • This is where it gets tricky. You could “just” pay for a subscription to Volumio and get first-party audio CD support including upsampling and other Hi Fi things. I’m using the B+’s headphone jack for output so Hi Fi is clearly none of my concern. And I’m too frugal for my own good, so I’m gonna do this myself.
    • Don’t install the plugin from the repository because it won’t work. Install the dependencies as described, then use the install script method. This will take a while as it compiles from source, and my B+ is not fast.
    • I’d like the CD to autoplay when inserted. There are instructions on the support page for how to script this: don’t use them. They have fancy quotation marks and emdashes which confuse both bash and curl when you try. Use instead the instructions on the source comment but don’t reset the volume.
  4. Install the Volumio App on your phone for remote control.
    • The “App” appears to be a webview that just loads http://volumio.local/ — for whatever reason my phone won’t resolve that host properly so I can’t just use the browsers I have already installed to access the UI.
  5. Move all the MP3s to a computer that is always on
    • You could use a USB drive attached to the Pi if you wanna, but I had space leftover on the Home Theatre PC, so I simply directed Volumio at the network share. Note that it demands credentials even for CIFS/Samba/Windows shares that don’t require credentials, so be prepared to add a straw account.

This was when we learned that our MP3 collection isn’t exactly nicely organized. Like Napster or eDonkey or Limewire or Kazaa, there were multiple slightly-different copies of some tracks or entire albums. Tracks weren’t really clear about what album, artist, and title they had… and the organization was a nightmare.

I’ve turned to Picard to help with the metadata challenges. So far it’s… fine? I dunno, AcoustID isn’t as foolproof as I was expecting it to be, and sometimes it decides to not group tracks into albums… it’s fine. So far.

Also, the gain levels of each track were different. Some were whisper-quiet and some were Cable TV Advertisement Loud. I’d hoped Volumio’s own volume normalization would help, but it seemed to silence already-quiet tracks and amplify high-gain recordings in the exact opposite of what I wanted. So I ran MP3Gain (yes, on sourceforge. Yes it hasn’t had a non-UI-language update since like 2008) for a few hours to get everyone singing at the same level, and turned off Volumio’s volume normalization.

And that’s where we are now. I’m not fully done with Picard (so many tracks to go). I haven’t added my own MP3 collection to the mix, with its additional duplicates and bad gain and whatnot…

…but it’s working. And it’s encouraging my wife and I to discover music we haven’t played in years. Which is wonderful.

If only because it annoys our preteen for her to learn that she kinda likes her parents’ tunes.

This Week in Glean: Firefox Telemetry is to Glean as C++ is to Rust

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

I had this goofy idea that, like Rust, the Glean SDKs (and Ecosystem) aim to bring safety and higher-level thought to their domain. This is in comparison to how, like C++, Firefox Telemetry is built out of flexible primitives that assume you very much know what you’re doing and cannot (will not?) provide any clues in its design as to how to do things properly.

I have these goofy thoughts a lot. I’m a goofy guy. But the more I thought about it, the more the comparison seemed apt.

In Glean wherever we can we intentionally forbid behaviour we cannot guarantee is safe (e.g. we forbid non-commutative operations in FOG IPC, we forbid decrementing counters). And in situations where we need to permit perhaps-unsafe data practices, we do it in tightly-scoped areas that are identified as unsafe (e.g. if a timing_distribution uses accumulate_raw_samples_nanos you know to look at its data with more skepticism).

In Glean we encourage instrumentors to think at a higher level (e.g. memory_distribution instead of a Histogram of unknown buckets and samples) thereby permitting Glean to identify errors early (e.g. you can’t start a timespan twice) and allowing Glean to do clever things about it (e.g. in our tooling we know counter metrics are interesting when summed, but quantity metrics are not). Speaking of those errors, we are able to forbid error-prone behaviour through design and use of language features (e.g. In languages with type systems we can prevent you from collecting the wrong type of data) and when the error is only detectable at runtime we can report it with a high degree of specificity to make it easier to diagnose.

There are more analogues, but the metaphor gets strained. (( I mean, I guess a timing_distribution’s `TimerId` is kinda the closest thing to a borrow checker we have? Maybe? )) So I should probably stop here.

Now, those of you paying attention might have already seen this relationship. After all, as we all know, glean-core (which underpins most of the Glean SDKs regardless of language) is actually written in Rust whereas Firefox Telemetry’s core of Histograms, Scalars, and Events is written in C++. Maybe we shouldn’t be too surprised when the language the system is written in happens to be reflected in the top-level design.

But! glean-core was (for a long time) written in Kotlin from stem to stern. So maybe it’s not due to language determinism and is more to do with thoughtful design, careful change processes, and a list of principles we hold to firmly as the number of supported languages and metric types continues to grow.

I certainly don’t know. I’m just goofing around.

:chutten

Responsible Data Collection is Good, Actually (Ubisoft Data Summit 2021)

In June I was invited to talk at Ubisoft’s Data Summit about how Mozilla does data. I’ve given a short talk on this subject before, but this was an opportunity to update the material, cover more ground, and include more stories. The talk, including questions, comes in at just under an hour and is probably best summarized by the synopsis:

Learn how responsible data collection as practiced at Mozilla makes cataloguing easy, stops instrumentation mistakes before they ship, and allows you to build self-serve analysis tooling that gets everyone invested in data quality. Oh, and it’s cheaper, too.

If you want to skip to the best bits, I included shameless advertising for Mozilla VPN at 3:20 and becoming a Mozilla contributor at 14:04, and I lose my place in my notes at about 29:30.

Many thanks to Mathieu Nayrolles, Sebastien Hinse and the Data Summit committee at Ubisoft for guiding me through the process and organizing a wonderful event.

:chutten

Data Science is Interesting: Why are there so many Canadians in India?

Any time India comes up in the context of Firefox and Data I know it’s going to be an interesting day.

They’re our largest Beta population:

pie chart showing India by far the largest at 33.2%

They’re our second-largest English user base (after the US):

pie chart showing US as largest with 37.8% then India with 10.8%

 

But this is the interesting stuff about India that you just take for granted in Firefox Data. You come across these factoids for the first time and your mind is all blown and you hear the perhaps-apocryphal stories about Indian ISPs distributing Firefox Beta on CDs to their customers back in the Firefox 4 days… and then you move on. But every so often something new comes up and you’re reminded that no matter how much you think you’re prepared, there’s always something new you learn and go “Huh? What? Wait, what?!”

Especially when it’s India.

One of the facts I like to trot out to catch folks’ interest is how, when we first released the Canadian English localization of Firefox, India had more Canadians than Canada. Even today India is, after Canada and the US, the third largest user base of Canadian English Firefox:

pie chart of en-CA using Firefox clients by country. Canada at 75.5%, US at 8.35%, then India at 5.41%

 

Back in September 2018 Mozilla released the official Canadian English-localized Firefox. You can try it yourself by selecting it from the drop down menu in Firefox’s Preferences/Options in the “Language” section. You may have to click ‘Search for More Languages’ to be able to add it to the list first, but a few clicks later and you’ll be good to go, eh?

(( Or, if you don’t already have Firefox installed, you can select which language and dialect of Firefox you want from this download page. ))

Anyhoo, the Canadian English locale quickly gained a chunk of our install base:

uptake chart for en-CA users in Firefox in September 2018. Shows a sharp uptake followed by a weekly seasonal pattern with weekends lower than week days

…actually, it very quickly gained an overlarge chunk of our install base. Within a week we’d reached over three quarters of the entire Canadian user base?! Say we have one million Canadian users, that first peak in the chart was over 750k!

Now, we Canadian Mozillians suspected that there was some latent demand for the localized edition (they were just too polite to bring it up, y’know)… but not to this order of magnitude.

So back around that time a group of us including :flod, :mconnor, :catlee, :Aryx, :callek (and possibly others) fell down the rabbit hole trying to figure out where these Canadians were coming from. We ran down the obvious possibilities first: errors in data, errors in queries, errors in visualization… who knows, maybe I was counting some clients more than once a day? Maybe I was counting other Englishes (like South African and Great Britain) as well? Nothing panned out.

Then we guessed that maybe Canadians in Canada weren’t the only ones interested in the Canadian English localization. Originally I think we made a joke about how much Canadians love to travel, but then the query stopped running and showed us just how many Canadians there must be in India.

We were expecting a fair number of Canadians in the US. It is, after all, home to Firefox’s largest user base. But India? Why would India have so many Canadians? Or, if it’s not Canadians, why would Indians have such a preference for the English spoken in ten provinces and three territories? What is it about one of two official languages spoken from sea to sea to sea that could draw their attention?

Another thing that was puzzling was the raw speed of the uptake. If users were choosing the new localization themselves, we’d have seen a shallow curve with spikes as various news media made announcements or as we started promoting it ourselves. But this was far sharper an incline. This spoke to some automated process.

And the final curiosity (or clue, depending on your point of view) was discovered when we overlaid British English (en-GB) on top of the Canadian English (en-CA) uptake and noticed that (after accounting for some seasonality at the time due to the start of the school year) this suddenly-large number of Canadian English Firefoxes was drawn almost entirely from the number previously using British English:

chart showing use of British and Canadian English in Firefox in September 2018. The rise in use of Canadian English is matched by a fall in the use of British English.

It was with all this put together that day that lead us to our Best Guess. I’ll give you a little space to make your own guess. If you think yours is a better fit for the evidence, or simply want to help out with Firefox in Canadian English, drop by the Canadian English (en-CA) Localization matrix room and let us know! We’re a fairly quiet bunch who are always happy to have folks help us keep on top of the new strings added or changed in Mozilla projects or just chat about language stuff.

Okay, got your guess made? Here’s ours:

en-CA is alphabetically before en-GB.

Which is to say that the Canadian English Firefox, when put in a list with all the other Firefox builds (like this one which lists all the locales Firefox 88 comes in for Windows 64-bit), comes before the British English Firefox. We assume there is a population of Firefoxes, heavily represented in India (and somewhat in the US and elsewhere), that are installed automatically from a list like this one. This automatic installation is looking for the first English build in this list, and it doesn’t care which dialect. Starting September of 2018, instead of grabbing British English like it’s been doing for who knows how long, it had a new English higher in the list: Canadian English.

But who can say! All I know is that any time India comes up in the data, it’s going to be an interesting day.

:chutten

Doubling the Speed of Windows Firefox Builds using sccache-dist

I’m one of the many users but few developers of Firefox on Windows. One of the biggest obstacles stopping me from doing more development on Windows instead of this beefy Linux desktop I have sitting under my table is how slow builds are.

Luckily, distributed compilation (and caching) using sccache is here to help. This post is a step-by-step version of the rather-more-scattered docs I found on the github repo and in Firefox’s documentation. Those guides are excellent and have all of the same information (though they forgot to remind me to put the ports on the url config variables), but they have to satisfy many audiences with many platforms and many use cases so I found myself having to switch between all three to get myself set up.

To synthesize what I learned all in one place, I’m writing my Home Office Version to be specific to “using a Linux machine to help your Windows machine compile Firefox on a local network”. Here’s how it goes:

  1. Ensure the Build Scheduler (Linux-only), Build Servers (Linux-only), and Build Clients (any of Linux, MacOS, Windows) all have sccache-dist.
    • If you have a Firefox Build present, ./mach bootstrap already gave you a copy at .mozbuild/sccache/bin
    • My Build Scheduler and solitary Build Server are both the same Linux machine.
  2. Configure how the pieces all talk together by configuring the Scheduler.
    • Make a file someplace (I put mine in ~/sccache-dist/scheduler.conf) and put in the public-facing IP address of the scheduler (better be static), the method and secret that Clients use to authenticate themselves, and the method and secret that Servers use to authenticate themselves.
    • Keep the tokens and secret keys, y’know, secret.
# Don't forget the port, and don't use an internal iface address like 127.0.0.1.
# This is where the Clients and Servers should find the Scheduler
public_addr = "192.168.1.1:10600"

[client_auth]
type = "token"
# You can use whatever source of random, long, hard-to-guess token you'd like.
# But chances are you have openssl anyway, and it's good enough unless you're in
# a VM or other restrained-entropy situation.
token = "<whatever the output of `openssl rand -hex 64` gives you>"

[server_auth]
type = "jwt_hs256"
secret_key = "<whatever the output of `sccache-dist auth generate-jwt-hs256-key` is>"
  1. Start the Scheduler to see if it complains about your configuration.
    • ~/.mozconfig/sccache/sccache-dist scheduler –config ~/sccache-dist/scheduler.conf
    • If it fails fatally, it’ll let you know. But you might also want to have `–syslog trace` while we’re setting things up so you can follow the verbose logging with `tail -f /var/log/syslog`
  2. Configure the Build Server.
    • Ensure you have bubblewrap >= 0.3.0 to sandbox your build jobs away from the rest of your computer
    • Make a file someplace (I put mine in ~/sccache-dist/server.conf) and put in the public-facing IP address of the server (better be static) and things like where and how big the toolchain cache should be, where the Scheduler is, and how you authenticate the Server with the Scheduler.
# Toolchains are how a Linux Server can build for a Windows Client.
# The Server needs a place to cache these so Clients don’t have to send them along each time.
cache_dir = "/tmp/toolchains"
# You can also config the cache size with toolchain_cache_size, but the default of 10GB is fine.

# This is where the Scheduler can find the Server. Don’t forget the port.
public_addr = "192.168.1.1:10501"

# This is where the Server can find the Scheduler. Don’t forget http. Don’t forget the port.
# Ideally you’d have an https server in front that’d add a layer of TLS and
# redirect to the port for you, but this is Home Office Edition.
scheduler_url = "http://192.168.1.1:10600"

[builder]
type = "overlay" # I don’t know what this means
build_dir = "/tmp/build" # Where on the fs you want that sandbox of build jobs to live
bwrap_path = "/usr/bin/bwrap" # Where the bubblewrap 0.3.0+ binary lives

[scheduler_auth]
type = "jwt_token"
token = "<what sccache-dist auth generate-jwt-hs256-server-token --secret-key <that key from scheduler.conf> --server <the value in public_addr including port>"
  1. Start the Build Server
    • `sudo` is necessary for this part to satisfy bubblewrap
    • sudo ~/.mozbuild/sccache/sccache-dist server –config ~/sccache-dist/server.conf
    • I’m not sure if it’s just me, but the build server runs in foreground without logs. Personally, I’d prefer a daemon.
    • If your scheduler’s tracelogging to syslog, you should see something in /var/log about the server authenticating successfully. If you aren’t, we can query the whole build network’s status in Step 7.
  2. Configure the Build Client.
    • This config file needs to have a specific name and location to be picked up by sccache. On Windows it’s `%APPDATA%\Mozilla\sccache\config\config`.
    • In it you need to write down how the Client can find and authenticate itself with the Scheduler. On not-Linux you also need to specify the toolchains you’ll be asking your Build Servers to use to compile your code.
[dist]
scheduler_url = "http://192.168.1.1:10600" # Don’t forget the protocol or port
toolchain_cache_size = 5368709120 # The default of 10GB is at least twice as big as you need.

# Gonna need two toolchains, one for C++ and one for Rust
# Remember to replace all <user> with your user name on disk
[[dist.toolchains]]
type = "path_override"
compiler_executable = "C:/Users/<user>/.mozbuild/clang/bin/clang-cl.exe"
archive = "C:/Users/<user>/.mozbuild/clang-dist-toolchain.tar.xz"
archive_compiler_executable = "/builds/worker/toolchains/clang/bin/clang"

[[dist.toolchains]]
type = "path_override"
compiler_executable = "C:/Users/<user>/.rustup/toolchains/stable-x86_64-pc-windows-msvc/bin/rustc.exe"
archive = "C:/Users/<user>/.mozbuild/rustc-dist-toolchain.tar.xz"
archive_compiler_executable = "/builds/worker/toolchains/rustc/bin/rustc"

# Near as I can tell, these dist.toolchains blocks tell sccache
# that if a job requires a tool at `compiler_executable` then it should instead
# distribute the job to be compiled using the tool present in `archive` at
# the path within the archive of `archive_compiler_executable`.
# You’ll notice that the `archive_compiler_executable` binaries do not end in `.exe`.

[dist.auth]
type = "token"
token = "<the value of scheduler.conf’s client_auth.token>"
  1. Perform a status check from the Client.
    • With the Scheduler and Server both running, go to the Client and run `.mozbuild/sccache/sccache.exe –dist-status`
    • It will start a sccache “client server” (ugh) in the background and try to connect. Ideally you’re looking for a non-0 “num_servers” and non-0 “num_cpus”
  2. Configure mach to use sccache
    • You need to tell it that it has a ccache and to configure clang to use `cl` driver mode (because when executing compiles on the Build Server it will see it’s called `clang` not `clang-cl` and thus forget to use `cl` mode unless you remind it to)
# Remember to replace all <user> with your user name on disk
ac_add_options CCACHE="C:/Users/<user>/.mozbuild/sccache/sccache.exe"

export CC="C:/Users/<user>/.mozbuild/clang/bin/clang-cl.exe --driver-mode=cl"
export CXX="C:/Users/<user>/.mozbuild/clang/bin/clang-cl.exe --driver-mode=cl"
export HOST_CC="C:/Users/<user>/.mozbuild/clang/bin/clang-cl.exe --driver-mode=cl"
export HOST_CXX="C:/Users/<user>/.mozbuild/clang/bin/clang-cl.exe --driver-mode=cl"
  1. Run a test build
    • Using the value of “num_cpus” from Step 7’s `–dist-status`, run `./mach build -j<num_cpus>`
    • To monitor if everything’s working, you have some choices
      • You can look at network traffic (expect your network to be swamped with jobs going out and artefacts coming back)
      • You can look at resource-using processes on the Build Server (you can use `top` to watch the number of `clang` processes)
      • If your Scheduler or Server is logging, you can `tail -f /var/log/syslog` to watch the requests and responses in real time

Oh, dang, I should manufacture a final step so it’s How To Speed Up Windows Firefox Builds In Ten Easy Steps (if you have a fast Linux machine and network). Oh well.

Anyhoo, I’m not sure if this is useful to anyone else, but I hope it is. No doubt your setup is less weird than mine somehow so you’ll be better off reading the general docs instead. Happy Firefox developing!

:chutten