This Week in Glean: A Distributed Team Echoes Distributed Workflow

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

Last Week: Extending Glean: build re-usable types for new use-cases by Alessio


I was recently struck by a realization that the position of our data org’s team members around the globe mimics the path that data flows through the Glean Ecosystem.

Glean Data takes this five-fold path (corresponding to five teams):

  1. Data is collected in a client using the Glean SDK (Glean Team)
  2. Data is transmitted to the Structured Ingestion pipeline (Data Platform)
  3. Data is stored and maintained in our infrastructure (Data Operations)
  4. Data is presented in our tools (Data Tools)
  5. Data is analyzed and reported on (Data Science)

The geographical midpoint of the Glean Team is about halfway across the north atlantic. For Data Platform it’s on the continental US, anchored by three members in the midwestern US. Data Ops is further West still, with four members in the Pacific timezone and no Europeans. Data Tools breaks the trend by being a bit further East, with fewer westcoasters. Data Science (for Firefox) is centred farther west still, with only two members East of the Rocky Mountains.

Or, graphically:

gleanEcosystemTeamCentres
(approximate) Team Geocentres

Given the rotation of the Earth, the sun rises first on the Glean Team and the data collected by the Glean SDK. Then the data and the sun move West to the Data Platform where it is ingested. Data Tools gets the data from the Platform as morning breaks over Detroit. Data Operations keeps it all running from the midwest. And finally, the West Coast Centre of Firefox Data Science Excellence greets the data from a mountaintop, to try and make sense of it all.

(( Lying orthogonal to the data organization is the secret Sixth Glean Data “Team”: Data Stewardship. They ensure all Glean Data is collected in accordance with Mozilla’s Privacy Promise. The sun never sets on the Stewardship’s global coverage, and it’s a volunteer effort supplied from eight teams (and growing!), so I’ve omitted them from this narrative. ))

Bad metaphors about sunlight aside, I wonder whether this is random or whether this is some sort of emergent behaviour.

Conway’s Law suggests that our system architecture will tend to reflect our orgchart (well, the law is a bit more explicit about “communication structure” independent of organizational structure, but in the data org’s case they’re pretty close). Maybe this is a specific example of that: data architecture as a reflection of orgchart geography.

Or perhaps five dots on a globe that are only mostly in some order is too weak of a coincidence to even bear thinking about? Nah, where’s the blog post in that thinking…

If it’s emergent, it then becomes interesting to consider the “chicken and egg” point of view: did the organization beget the system or the system beget the organization? When I joined Mozilla some of these teams didn’t exist. Some of them only kinda existed within other parts of the company. So is the situation we’re in today a formalization by us of a structure that mirrors the system we’re building, or did we build the system in this way because of the structure we already had?

mindblown.gif

:chutten

This Week in Glean: Glean in Private

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

In the Kotlin implementation of the Glean SDK we have a glean.private package. (( Ideally anything that was actually private in the Glean SDK would actually _be_ private and inaccessible, but in order to support our SDK magic (okay, so that the SDK could work properly by generating the Specific Metrics API in subcomponents) we needed something public that we just didn’t want anyone to use. )) For a little while this week it looked like the use of the Java keyword private in the name was going to be problematic. Here are some of the alternatives we came up with:

Fortunately (or unfortunately) :mdboom (whom I might have to start calling Dr. Boom) came up with a way to make it work with the package private intact, so we’ll never know which one we would’ve gone with.

Alas.

I guess I’ll just have to console myself with the knowledge that we’ve deployed this fix to Fenix, Python bindings are becoming a reality, and the first code supporting the FOGotype might be landing in mozilla-central. (More to come on all of that, later)

:chutten

This Week in Glean: Glean on Desktop (Project FOG)

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

The Glean SDK is doing well on mobile. It’s shipping in Firefox Preview and Firefox for Fire TV on Android, and our iOS port for Lockwise is shaping up wonderfully well. Data is flowing in, letting us know how the products are being used.

It’s time to set our sights on Desktop.

It’s going to be tricky, but to realize one of the core benefits of the Glean SDK (the one about not having to maintain more than one data collection client library across Mozilla’s products) we have to do this. Also, we’re seeing more than a little interest from our coworkers to get going with it already : )

One of the reasons it’s going to be tricky is that Desktop isn’t like Mobile. As an example, the Glean SDK “baseline” ping is sent whenever the product is sent to the background. This is predicated on the idea that the user isn’t using the application when it’s in the background. But on Desktop, there’s no similar application lifecycle paradigm we can use in that way. We could try sending a ping whenever focus leaves the browser (onblur), but that can happen very often and doesn’t have the same connotation of “user isn’t using it”. And what if the focus leaves one browser window to attach to another browser window? We need to have conversations with Data Science and Firefox Peers to figure out what lifecycle events most closely respect our desire to measure engagement.

And that’s just one reason. One reason that needs investigation, exploration, discussion, design, proposal, approval, implementation, validation, and documentation.

And this reason’s one that we actually know something about. Who knows what swarm of unknown quirks and possible failures lies in wait?

That’s why step one in this adventure is a prototype. We’ll integrate the Glean SDK into Firefox Desktop and turn some things on. We’ll try some things out. We’ll make mistakes, and write it all down.

And then we’ll tear it out and, using what we’ve learned, do it over again. For real.

This prototype won’t have an answer for the behaviour of the “baseline” ping… so it won’t have a “baseline” ping. It won’t know the most efficient way to build a JavaScript metrics API (webidl? JSM? JSContext?), so it won’t have one. It won’t know how best to collect data from the many different processes of many different types that Firefox now boasts, so it will live in just one.

This investigative work will be done by the end of the year with the ultimate purpose of answering all the questions we need in order to proceed next year with the full implementation.

That’s right. You heard it here first:

2020 will be the year of Glean on the Desktop.

:chutten