This Week in Glean: Glean in Private

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

In the Kotlin implementation of the Glean SDK we have a glean.private package. (( Ideally anything that was actually private in the Glean SDK would actually _be_ private and inaccessible, but in order to support our SDK magic (okay, so that the SDK could work properly by generating the Specific Metrics API in subcomponents) we needed something public that we just didn’t want anyone to use. )) For a little while this week it looked like the use of the Java keyword private in the name was going to be problematic. Here are some of the alternatives we came up with:

Fortunately (or unfortunately) :mdboom (whom I might have to start calling Dr. Boom) came up with a way to make it work with the package private intact, so we’ll never know which one we would’ve gone with.

Alas.

I guess I’ll just have to console myself with the knowledge that we’ve deployed this fix to Fenix, Python bindings are becoming a reality, and the first code supporting the FOGotype might be landing in mozilla-central. (More to come on all of that, later)

:chutten

Distributed Teams: Why I Don’t Go to the Office More Often

I was invited to a team dinner as part of a work week the Data Platform team was having in Toronto. I love working with these folks, and I like food, so I set about planning my logistics.

The plan was solid, but unimpressive. It takes three hours or so to get from my home to the Toronto office by transit, so I’d be relying on the train’s WiFi to allow me to work on the way to Toronto, and I’d be arriving home about 20min before midnight.

Here’s how it went:

  1. 0800 Begin
  2. 0816 Take the GRT city bus to Kitchener train station
  3. 0845 Try to find a way to get to the station (the pedestrian situation around the station is awful)
  4. 0855 Learn that my 0918 train is running 40min late.
  5. 0856 Purchase a PRESTO card for my return journey, being careful to not touch any of the blood stains on the vending machine. (Seriously. Someone had a Bad Time at Kitchener Station recently)
  6. 0857 Learn that they had removed WiFi from the train station, so the work I’ll be able to do is limited to what I can manage on my phone’s LTE
  7. 0900 Begin my work day (Slack and IRC only), and eat the breakfast I packed because I didn’t have time at home.
  8. 0943 Train arrives only 35min late. Goodie.
  9. 0945 Learn from the family occupying my seat that I actually bought a ticket for the wrong day. Applying a discount code didn’t keep the date and time I selected, and I didn’t notice until it was too late. Sit in a different seat and wonder what the fare inspector will think.
  10. 0950 Start working from my laptop. Fear of authority can build on its own time, I have emails to answer and bugs to shuffle.
  11. 1030 Fare inspector finally gets around to me as my nervousness peaks. Says they’ll call it in and there might be an adjustment charge to reschedule it.
  12. 1115 Well into Toronto, the fare inspector just drops my ticket into my lap on their way to somewhere else. I… guess everything’s fine?
  13. 1127 Train arrives at Toronto Union Station. Disconnect WiFi, disembark and start walking to the office. (Public transit would be slower, and I’m saving my TTC token for tonight’s trip)
  14. 1145 Arrive at MoTo just in time for lunch.

Total time to get to Mozilla Toronto: 3h45min. Total distance traveled: 95km Total cost: $26 for the Via rail ticket, $2.86 for the GRT city bus.

The way back wasn’t very much better. I had to duck out of dinner at 8pm to have a hope of getting home before the day turned into tomorrow:

  1. 2000 Leave the team dinner, say goodnights. Start walking to the subway
  2. 2012 At the TTC subway stop learn that the turnstiles don’t take tokens any more. Luckily there’s someone in the booth to take my fare.
  3. 2018 Arrive at Union station and get lost in the construction. I thought the construction was done (the construction is never done).
  4. 2025 Ask at the PRESTO counter how to use PRESTO properly. I knew it was pay-by-distance but I was taking a train _and_ a bus, so I wasn’t sure if I needed to tap in between the two modes (I do. Tap before the train, after the train, on the bus when you get on, and on the bus when you get off. Seems fragile, but whatever).
  5. 2047 Learn that the train’s been rescheduled 6min later. Looks like I can still make my bus connection in Bramalea.
  6. 2053 Tap on the thingy, walk up the flights of stairs to the train, find a seat.
  7. 2102 “Due to platform restrictions, the doors on car 3107 will not open at Bramalea”… what car am I on? There’s no way to tell from where I’m sitting.
  8. 2127 Arrive at Bramalea. I’m not on car 3107.
  9. 2130 Learn that there’s one correct way to leave the platform and I took the other one that leads to the parking lot. Retrace my steps.
  10. 2132 Tap the PRESTO on the thingy outside the station building (closed)
  11. 2135 Tap the PRESTO on the thingy inside the bus. BEEP BEEP. Bus driver says insufficient funds. That can’t be, I left myself plenty of room. Tick tock.
  12. 2136 Cold air aching in my lungs from running I load another $20 onto the PRESTO
  13. 2137 Completely out of breath, tap the PRESTO on the thingy inside the bus. Ding. Collapse in a seat. Bus pulls out just moments later.
  14. 2242 Arrive in Kitchener. Luckily the LRT, running at 30min headways, is 2min away. First good connection of the day.
  15. 2255 This is the closest the train can get me. There’s a 15min wait (5 of which I’ll have to walk in the cold to get to the stop) for a bus that’ll get me, in 7min, within a 10min walk from home. I decide to walk instead, as it’ll be faster.
  16. 2330 Arrive home.

Total time to get home: 3h30min. Total distance traveled: 103km. Total cost: $3.10 for the subway token, $46 PRESTO ($6 for the card, $20 for the fare, $20 for the surprise fare), $2.86 for the LRT.

At this point I’ve been awake for over 20 hours.

Is it worth it? Hard to say. Every time I plan one of these trips I look forward to it. Conversations with office folks, eating office lunch, absconding with office snacks… and this time I even got to go out to dinner with a bunch of data people I work with all the time!

But every time I do this, as I’m doing it, or as I’m recently back from doing it… I don’t feel great about it. It’s essentially a full work day (nearly eight full hours!) just in travel to spend 5 hours in the office, and (this time) a couple hours afterwards in a restaurant.

Ultimately this — the share of my brain I need to devote purely to logistics, the manifold ways things can go wrong, the sheer _time_ it all takes — is why I don’t go into the office more often.

And the people are the reason I do it at all.

:chutten

Four-Year Moziversary

Wowee what a year that was. And I’m pretty sure the year to come will be even more so.

We gained two new team members, Travis and Beatriz. And with Georg taking a short break, we’ve all had more to do that usual. Glean‘s really been working out well, though I’ve only had the pleasure of working on it a little bit.

Instead I’ve been adding fun new features to Firefox Desktop like Origin Telemetry. I also gave a talk at a conference about Data and Responsibility. Last December’s All Hands returned us to Orlando, and June brought me to Whistler for the first time. We held a Virtual Work Week (or “vorkweek”) a couple of weeks ago when we couldn’t find a time and the budget to meet in person, and spent it planning out how we’ll bring Glean to Firefox Desktop with Project FOG. First with a Prototype (FOGotype) by end of year. And then 2020 will be the year of Glean on the Desktop.

Blogging-wise I’ve slowed down quite a lot. 12 posts so far this calendar year is much lower than previous years’ 25+. The velocity I’d kept up by keeping tabs on the Ontario Provincial Legislature and pontificating about video games I’d played died in the face of mounting work pressures. Instead of spending my off time writing non-mozilla things I spent a lot of it reading instead (as my goodreads account can attest).

But now that I’ve written this one, maybe I’ll write more here.

Resolution for the coming year? More blogging. Continued improvement. Put Glean on Firefox. That is all.

This Week in Glean: Glean on Desktop (Project FOG)

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

The Glean SDK is doing well on mobile. It’s shipping in Firefox Preview and Firefox for Fire TV on Android, and our iOS port for Lockwise is shaping up wonderfully well. Data is flowing in, letting us know how the products are being used.

It’s time to set our sights on Desktop.

It’s going to be tricky, but to realize one of the core benefits of the Glean SDK (the one about not having to maintain more than one data collection client library across Mozilla’s products) we have to do this. Also, we’re seeing more than a little interest from our coworkers to get going with it already : )

One of the reasons it’s going to be tricky is that Desktop isn’t like Mobile. As an example, the Glean SDK “baseline” ping is sent whenever the product is sent to the background. This is predicated on the idea that the user isn’t using the application when it’s in the background. But on Desktop, there’s no similar application lifecycle paradigm we can use in that way. We could try sending a ping whenever focus leaves the browser (onblur), but that can happen very often and doesn’t have the same connotation of “user isn’t using it”. And what if the focus leaves one browser window to attach to another browser window? We need to have conversations with Data Science and Firefox Peers to figure out what lifecycle events most closely respect our desire to measure engagement.

And that’s just one reason. One reason that needs investigation, exploration, discussion, design, proposal, approval, implementation, validation, and documentation.

And this reason’s one that we actually know something about. Who knows what swarm of unknown quirks and possible failures lies in wait?

That’s why step one in this adventure is a prototype. We’ll integrate the Glean SDK into Firefox Desktop and turn some things on. We’ll try some things out. We’ll make mistakes, and write it all down.

And then we’ll tear it out and, using what we’ve learned, do it over again. For real.

This prototype won’t have an answer for the behaviour of the “baseline” ping… so it won’t have a “baseline” ping. It won’t know the most efficient way to build a JavaScript metrics API (webidl? JSM? JSContext?), so it won’t have one. It won’t know how best to collect data from the many different processes of many different types that Firefox now boasts, so it will live in just one.

This investigative work will be done by the end of the year with the ultimate purpose of answering all the questions we need in order to proceed next year with the full implementation.

That’s right. You heard it here first:

2020 will be the year of Glean on the Desktop.

:chutten

Distributed Teams: Regional Peculiarities Like Oktoberfest and Bagged Milk

It’s Oktoberfest! You know, that German holiday about beer and lederhosen?

No. As many Germans will tell you it’s not a German thing as much as it is a Bavarian thing. It’s like saying kilts are a British thing (it’s a Scottish thing). Or that milk in bags is a Canadian thing (in Canada it’s an Eastern Canada thing).

In researching what the heck I was talking about when I was making this comparison at a recent team meeting, Alessio found a lovely study on the efficiency of milk bags as milk packaging in Ontario published by The Environment and Plastics Industry Council in 1997.

I highly recommend you skim it for its graphs and the study conclusions. The best parts for me are how it highlights that the consumption of milk (by volume) increased 22% from 1968 to 1995 while at the same time the amount (by mass) of solid waste produced by milk packaging decreased by almost 20%.

I also liked Table 8 which showed the recycling rates of the various packaging types that we’d need to reach in order to match the small amount (by mass) of solid waste generation of the (100% unrecycled) milk bags. (Interestingly, in my region you can recycle milk bags if you first rinse and dry them).

I guess what I’m trying to say about this is three-fold:

  1. Don’t assume regional characteristics are national in your distributed team. Berliners might not look forward to Oktoberfest the way Münchner do, and it’s possible no one in the Vancouver office owns a milk jug or bag cutter.
  2. Milk Bags are kinda neat, and now I feel a little proud about living in a part of the world where they’re common. I’d be a little more confident about this if the data wasn’t presented by the plastics industry, but I’ll take what I can get (and I’ll start recycling my milk bags).
  3. Geez, my team can find data for _any topic_. What differences we have by being distributed around the world are eclipsed by how we’re universally a bunch of nerds.

:chutten

 

My StarCon 2019 Talk: Collecting Data Responsibly and at Scale

 

Back in January I was privileged to speak at StarCon 2019 at the University of Waterloo about responsible data collection. It was a bitterly-cold weekend with beautiful sun dogs ringing the morning sun. I spent it inside talking about good ways to collect data and how Mozilla serves as a concrete example. It’s 15 minutes short and aimed at a general audience. I hope you like it.

I encourage you to also sample some of the other talks. Two I remember fondly are Aaron Levin’s “Conjure ye File System, transmorgifier” about video games that look like file systems and Cory Dominguez’s lovely analysis of Moby Dick editions in “or, the whale“. Since I missed a whole day, I now get to look forward to fondly discovering new ones from the full list.

:chutten

Data Science is Hard: Validating Data for Glean

Glean is a new library for collecting data in Mozilla products. It’s been shipping in Firefox Preview for a little while and I’d like to take a minute to talk about how I validated that it sends what we think it’s sending.

Validating new data collections in an existing system like Firefox Desktop Telemetry is a game of comparing against things we already know. We know that some percentage of data we receive is just garbage: bad dates, malformed records, attempts at buffer overflows and SQL injection. If the amount of garbage in the new collection is within the same overall amount of garbage we see normally, we count it as “good enough” and move on.

With new data collection from a new system like Glean coming from new endpoints like the reference browser and Firefox Preview, we’re given an opportunity to compare against the ideal. Maybe the correct number of failures is 0?

But what is a failure? What is acceptable behaviour?

We have an “events” ping in Glean: can the amount of time covered by the events’ timestamps ever exceed the amount of time covered by the ping? I didn’t think so, but apparently it’s an expected outcome when the events were restored from a previous session.

So how do you validate something that has unknown error states?

I started with a list of things any client-based network-transmitted data collection system had to have:

  • How many pings (data transmissions) are there?
  • How many measurements are in those pings?
  • How many clients are sending these pings?
  • How often?
  • How long do they take to get to our servers?
  • How many poorly-structured pings are sent? By how many clients? How often?
  • How many pings with bad values are sent? By how many clients? How often?

From there we can dive into validating specifics about the data collections:

  • Do the events in the “events” ping have timestamps with reasonable separations? (What does reasonable mean here? Well, it could be anything, but if the span between two timestamps is measured in years, and the app has only been available for some number of weeks, it’s probably broken.)
  • Are the GUIDs in the pings actually globally unique? Are we seeing duplicates? (We are, but not many)
  • Are there other ping fields that should be unique, but aren’t? (For Glean no client should ever send the same ping type with the same sequence number. But that kind of duplicate appears, too)

Once we can establish confidence in the overall health of the data reporting mechanism we can start using it to report errors about itself:

  • Ping transmission should be quick (because they’re small). Assuming the ping transmission time is 0, how far away are the clients’ clocks from the server’s clock? (AKA “Clock skew”. Turns out that mobile clients’ clocks are more reliable than desktop clients’ clocks (at least in the prerelease population. We’ll see what happens when we start measuring non-beta users))
  • How many errors are reported by internal error-reporting metrics? How many send failures? How many times did the app try to record a string that was too long?
  • What measurements are in the ping? Are they only the ones we expect to see? Are they showing in reasonable proportions relative to each other and the number of clients and pings reporting them?

All these attempts to determine what is reasonable and what is correct depend on a strong foundation of documentation. I can read the code that collects the data and sends the pings… but that tells me what is correct relative to what is implemented, not what is correct relative to what is intended.

By validating to the documentation, to what is intended, we can not only find bugs in the code, we can find bugs in the docs. And a data collection system lives and dies on its documentation: it is in many ways a more important part of the “product” than the code.

At this point, aside from the “metrics” ping which is awaiting validation after some fixes reach saturation in the population, Glean has passed all of these criteria acceptably. It still has a bit of a duplicate ping problem, but its clock skew and latency are much lower than Firefox Desktop’s. There are some outrageous clients sending dozens of pings over a period that they should be sending a handful, but that might just be a test client whose values will disappear into the noise when the user population grows.

:chutten