Five-Year Moziversary

Wowee what a year that was. And I’m pretty sure the year to come will be even more so.

Me, in last year’s moziversary post

Oof. I hate being right for the wrong reasons. And that’s all I’ll say about COVID-19 and the rest of the 2020 dumpster fire.

In team news, Georg’s short break turned into the neverending kind as he left Mozilla late last year. We gained Michael Droettboom as our new fearless leader, and from my perspective he seems to be doing quite well at the managery things. Bea and Travis, our two newer team members, have really stepped into their roles well, providing much needed bench depth on Rust and Mobile. And Jan-Erik has taken over leadership of the SDK, freeing up Alessio to think about data collection for Web Extensions.

2020 is indeed being the Year of Glean on the Desktop with several projects already embedding the now-successful Glean SDK, including our very own mach (Firefox Build Tooling Commandline) and mozregression (Firefox Bug Regression Window Finding Tool). Oh, and Jan-Erik and I’ve spent ten months planning and executing on Project FOG (Firefox on Glean) (maybe you’ve heard of it), on track (more or less) to be able to recommend it for all new data collections by the end of the year.

My blogging frequency has cratered. Though I have a mitt full of ideas, I’ve spent no time developing them into proper posts beyond taking my turn at This Week in Glean. In the hopper I have “Naming Your Kid Based on how you Yell At Them”, “Tools Externalize Costs to their Users”, “Writing Code for two Wolves: Computers and Developers”, “Glean is Frictionless”, “Distributed Teams: Proposals are Inclusive”, and whatever of the twelve (Twelve?!) drafts I have saved up in wordpress that have any life in them.

Progress on my resolutions to blog more, continue improving, and put Glean on Firefox? Well, I think I’ve done the latter two. And I think those resolutions are equally valid for the next year, though I may tweak “put Glean on Firefox” to “support migrating Firefox Telemetry to Glean” which is more or less the same thing.

:chutten

This Week in Glean: Project FOG Update, end of H12020

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

It’s been a while since last I wrote on Project FOG, so I figure I should update all of you on the progress we’ve made.

A reminder: Project FOG (Firefox on Glean) is the year-long effort to bring the Glean SDK to Firefox. This means answering such varied questions as “Where are the docs going to live?” (here) “How do we update the SDK when we need to?” (this way) “How are tests gonna work?” (with difficulty) and so forth. In a project this long you can expect updates from time-to-time. So where are we?

First, we’ve added the Glean SDK to Firefox Desktop and include it in Firefox Nightly. This is only a partial integration, though, so the only builtin ping it sends is the “deletion-request” ping when the user opts out of data collection in the Preferences. We don’t actually collect any data, so the ping doesn’t do anything, but we’re sending it and soon we’ll have a test ensuring that we keep sending it. So that’s nice.

Second, we’ve written a lot of Design Proposals. The Glean Team and all the other teams our work impacts are widely distributed across a non-trivial fragment of the globe. To work together and not step on each others’ toes we have a culture of putting most things larger than a bugfix into Proposal Documents which we then pass around asynchronously for ideation, feedback, review, and signoff. For something the size and scope of adding a data collection library to Firefox Desktop, we’ve needed more than one. These design proposals are Google Docs for now, but will evolve to in-tree documentation (like this) as the proposals become code. This way the docs live with the code and hopefully remain up-to-date for our users (product developers, data engineers, data scientists, and other data consumers), and are made open to anyone in the community who’s interested in learning how it all works.

Third, we have a Glean SDK Rust API! Sorta. To limit scope creep we haven’t added the Rust API to mozilla/glean and are testing its suitability in FOG itself. This allows us to move a little faster by mixing our IPC implementation directly into the API, at the expense of needing to extract the common foundation later. But when we do extract it, it will be fully-formed and ready for consumers since it’ll already have been serving the demanding needs of FOG.

Fourth, we have tests. This was a bit of a struggle as the build order of Firefox means that any Rust code we write that touches Firefox internals can’t be tested in Rust tests (they must be tested by higher-level integration tests instead). By damming off the Firefox-adjacent pieces of the code we’ve been able to write and run Rust tests of the metrics API after all. Our code coverage is still a little low, but it’s better than it was.

Fifth, we are using Firefox’s own network stack to send pings. In a stroke of good fortune the application-services team (responsible for fan-favourite Firefox features “Sync”, “Send Tab”, and “Firefox Accounts”) was bringing a straightforward Rust networking API called Viaduct to Firefox Desktop almost exactly when we found ourselves in need of one. Plugging into Viaduct was a breeze, and now our “deletion-request” pings can correctly work their way through all the various proxies and protocols to get to Mozilla’s servers.

Sixth, we have firm designs on how to implement both the C++ and JS APIs in Firefox. They won’t be fully-fledged language bindings the way that Kotlin, Python, and Swift are (( they’ll be built atop the Rust language binding so they’re really more like shims )), but they need to have every metric type and every metric instance that a full language binding would have, so it’s no small amount of work.

But where does that leave our data consumers? For now, sadly, there’s little to report on both the input and output sides: We have no way for product engineers to collect data in Firefox Desktop (and no pings to send the data on), and we have no support in the pipeline for receiving data, not that we have any to analyse. These will be coming soon, and when they do we’ll start cautiously reaching out to potential first customers to see whether their needs can be satisfied by the pieces we’ve built so far.

And after that? Well, we need to do some validation work to ensure we’re doing things properly. We need to implement the designs we proposed. We need to establish how tasks accomplished in Telemetry can now be accomplished in the Glean SDK. We need to start building and shipping FOG and the Glean SDK beyond Nightly to Beta and Release. We need to implement the builtin Glean SDK pings. We need to document the designs so others can understand them, best practices so our users can follow them, APIs so engineers can use them, test guarantees so QA can validate them, and grand processes for migration from Telemetry to Glean so that organizations can start roadmapping their conversions.

In short: plenty has been done, and there’s still plenty to do. 

I guess we’d better be about it, then.

:chutten

This Week in Glean: How Much Does That Data Cost?

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

I’ve written before about data, but never tackled the business perspective. To a business, what is data? It could be considered an asset, I suppose: a tool, like a printer, to make your business more efficient.

But like that printer and other assets, data has a cost. We can quite easily look up how much it costs to store arbitrary data on AWS (less than 2.3 cents USD per GB per month) but that only provides the cost of the data at rest. It doesn’t consider what it took for the data to get there or how much it costs to be useful once it’s stored.

So let’s imagine that you come across a decision that can only be made with data. You’ve tried your best to do without it, but you really do need to know how many Live Bookmarks there are per Firefox profile… maybe it’s in wide use and we should assign someone to spruce it up. Maybe almost no one uses it and so Live Bookmarks should be removed and instead become a feature provided by extensions.

This should be easy, right? Slap the number into an HTTP payload and send it to a Mozilla-controlled server. Then just count them all up!

As one of the Data Organization’s unofficial mottos puts it: Counting is Harder Than It Looks.

Let’s look at the full lifecycle of the metric from ideation and instrumentation to expiry and deletion. I’ll measure money and time costs, being clear about the assumptions guiding my estimates and linking to sources where available.

For a rule of thumb, time costs are $50 per hour. Developers and Managers and PMs cost more than $100k per year in total compensation in many jurisdictions, and less in many others. Let’s go with this because why not. I considered ignoring labour costs altogether because these people are doing their jobs whether they’re performing their part in this collection or not… but that’s assuming they have the spare capacity and would otherwise be doing nothing. Everyone I talk to is busy, so everyone’s doing this data collection work instead of something else they could be doing: so there is an opportunity cost.

Fixed costs, like the cost of building and maintaining a data collection library, data collection pipeline, bug trackers, code review tooling, dev computers are all ignored. We could amortize that per data collection… but it’d probably work out to $0 anyway.

Also, for the purposes of measuring data we’re counting only the size of the data itself (the count of the number of Live Bookmarks). To be more complete we’d need to amortize the cost of sending the data point (HTTP headers, payload metadata, the data point’s identifier, etc.) and factor in additional complexity (transfer encoding, compression, etc.). This would require a lot of words, and in the present Firefox Telemetry system this amortizes to 0 because the “main” ping has many data points in it and gzip compression is pretty good.

Also, this is a Best Case Estimate. I make many assumptions small in order to make this a lower-bound cost if everything goes according to plan and everyone acts the way they should.

Ideation – Time: 30min, Cost: $25

How long does it take you to figure out how to measure something? You need to know the feature you’re measuring, the capabilities of the data collection library you’re using to do the measuring, and some idea of how you’ll analyse it at the other end.  If you’re trying to send something clever like the state of a customizable UI element or do something that requires custom analysis, this will take longer and take more people which will cost more money.

But for our example we know what we’re collecting: numbers of things. The data collection library is old and well understood. The analysis is straightforward. This takes one person a half hour to think through.

Instrumentation – Time: 60min, Cost: $50

Knowing the feature is not the same as knowing the code. You need a subject matter expert (developer who knows the feature and the code as well as the data collection library’s API) to figure out on exactly which line of code we should call exactly what method with exactly which count. If it’s complicated, several people may need to meet in order to figure out what to do here: are the input event timestamps the same format on Windows and Mac? Does time when the computer is asleep count or not?

For our example we have questions: Should we count the number of Live Bookmarks in the database? The number in the Bookmark Menu? The Bookmark Toolbar? What if the user deletes one, should we count before or after the delete?

This is small enough that we can find a single subject matter expert who knows it all. They read some documentation, make some decisions, write some code, and take an hour to do this themselves.

Review – Time: 30min, Cost $25

Both the code and the data collection need review. The simplicity of the data collection and the code make this quick. Mozilla’s code review tooling helps a lot here, too. Though it takes a day or two for the Module Peer and the Data Steward to find time to get to the reviews, it only takes a combination of a half hour for them to okay it to ship.

Storage (user) – Cost: $0

Data takes up space. Its definition takes up some bytes in the Firefox binary that you installed. It takes up bytes in your computer’s memory. It takes up bytes on disk while it waits to be sent and afterwards so you can look at it if you type about:telemetry into your address bar. (Try it yourself!)

The marginal cost to the user of the tens of bytes of memory and disk from our single number of Live Bookmarks is most accurately represented as a zero not only because memory and disk are excitingly cheap these days but also because there was likely some spare capacity in those systems.

Bandwidth (user) – Cost: $0.00 (but not zero)

Data requires network bandwidth to be reported, and network bandwidth costs money. Many consumer plans are flat-rate and so the marginal cost of the extra bytes is not felt at all (we’re using a little of the slack), so we can flatten this to zero.

But hey, let’s do some recreational math for fun! (We all do this in our spare time, right? It’s not just me?)

If we were paying per-byte and sending this from a smartphone, the first GB in Canada (where mobile data makes the most money for the service providers in the world) costs $30 per month. That’s about 3 thousandths of a cent per kilobyte.

The data collection is a number, which is about 4 bytes of data. We send it about three times per day and individual profiles are in use by Firefox on average 12 days a month (engagement ratio of 0.4). (If you’re interested, this is due to a bunch of factors including users having multiple profiles at school, work, and home… but that’s another blog post).

4 bytes x 3 per day x 12 days in a month ~= 144 bytes per month

Thus a more accurate cost estimate of user bandwidth for this data would be 4 ten-thousandths of a cent (in Canadian dollars). It would take over 200 years of reporting this figure to cost the user a single penny. So let’s call it 0 for our purposes here.

Though… however close the cost is to 0, It isn’t 0. This means that, over time and over enough data points and over our full Firefox population, there is a measurable cost. Though its weight is light when it is but a single data point sent infrequently by each of our users, put together it is still hefty enough that we shouldn’t ignore it.

Bandwidth (Mozilla) – Cost: $0

Internet Service Providers have a nice gig: they can charge the user when the bytes leave their machine and charge Mozilla when the bytes enter their machine. However, cloud data platform providers (Amazon’s AWS, Google’s GCP, Microsoft’s Azure, etc) don’t charge for bandwidth for the data coming into their services.

You do get charged for bandwidth _leaving_ their systems. And for anything you do _on_ their systems. If I were feeling uncharitable I guess I’d call this a vendor lock-in data roach motel.

At any rate, the cost for this step is 0.

Pipeline Processing – Cost: $15.12

Once our Live Bookmarks data reaches the pipeline, there’s a few steps the data needs to go through. It needs to be decompressed, examined for adherence to the data schema (malformed data gets thrown out), and a response written to the client to tell it that we received it all okay. It needs to be processed, examined, and funneled to the correct storage locations while also being made available for realtime analysis if we need it.

For our little 4-byte number that shouldn’t be too bad, right?

Well, now that we’re on Mozilla’s side of the operation we need to consider the scale. Just how many Firefox profiles are sending how many of these numbers at us? About 250M of them each month. (At time of writing this isn’t up-to-date beyond EOY2019. Sorry about that. We’re working on it). With an engagement ratio of about 0.4, data being sent about thrice a day, and each count of Live Bookmarks taking up 4 bytes of space, we’re looking at 12GB of data per month.

At our present levels, ingestion and processing costs about $90 per TB. This comes out to $1.08 of cost for this step, each month. Multiplied by 14 “months”, that’s $15.12.

About Months

In saying “14 months” for how long the pipeline needs to put up with the collection coming from the entire Firefox population I glossed over quite a lot of detail. The main piece of information is that the default expiry for new data collections in Firefox is five or six Firefox versions (which should come out to about six months).

However, as I’ve mentioned before, updates don’t all happen at once. Though we have about 90% of the Firefox population within 3 versions of up-to-date at any one time, there’s a long tail of Firefox profiles from ancient versions sending us data.

To calculate 14 months I looked at the total data collection volumes for five versions of Firefox: Firefox 69-73 (inclusive). This avoids Firefox ESR 68 gumming up the works (its support lifetime is much longer than a normal release, and we’re aiming for a best-case cost estimate) and is both far enough in the past that Firefox 69 ought to be winding down around now _and_ is recent enough that we’ll not have thrown out the data yet (more on retention periods later) and it is closer in behaviour to releases we’re doing this year.

Here’s what that looks like:time series plot showing data volumes from five Firefox versions

So I said this was far enough in the past that Firefox 69 ought to be winding down around now? Well, if you look really closely at the bottom-right you might be able to see that we’re still receiving data from users still on that Firefox version. Lots of them.

But this is where we are in history, and I’m not running this query again (it only cost 15 cents, but it took half an hour), so let’s do the math. The total amount of data received from these five releases so far divided by the amount of data I said above that the user population would be sending each month (12GB) comes out to about 13.7 months.

To account for the seriously-annoying number of pings from those five versions that we presumably will continue receiving into the future, I rounded up to 14.

Storage (Mozilla) – Cost: $84

Once the data has been processed it needs to live somewhere. This costs us 2 cents per gigabyte stored, per month we decide to store it. 12GB per month means $0.24, right?

Well, no. We don’t have a way to only store this data for a period of time, so we need to store it for as long as the other stuff we store. For year-over-year forecasting we retain data for two years plus one month: 25 months. (Well, we presently retain data a bit longer than that, but we’re getting there.) So we need to take the 12GB we get each month and store it for 25 months. When we do that for each of the 14 “months” of data we get:

12GB/”month” x 14 “months” x $0.02 per GB per month x 25 months retention = $84

Now if you think this “2 cents per GB” figure is a little high: it is! We should be able to take advantage of lower storage costs for data we don’t write to any more. Unfortunately, we do write to it all the time servicing Deletion Requests (which I’ll get to in a little bit).

Analysis (Mozilla) – Time: 30min, Cost: $25.55

Data stored on some server someplace is of no use. Its value is derived through interrogating it, faceting its aggregations across interesting dimensions, picking it apart and putting it back together.

If this sounds like processing time Mozilla needs to pay for, you are correct!

On-demand analyses in Google’s BigQuery cost $5 per TB of data scanned. Mozilla’s spent some decent time thinking about query patterns to arrange data in a way that minimizes the amount of data we need to look at in any given analysis… but it isn’t perfect. To deliver us a count of the number of Live Bookmarks across our user base we’re going to have to scan more than the 12GB per month.

But this is a Best Case Estimate so let’s figure out how much a perfect query (one that only had to scan the data we wanted to get out of it) would cost:

12GB / 1000GB/TB * 5 $/TB = $0.06

That gives you back a sum of all the Live Bookmarks reported from all the Firefox profiles in a month. The number might be 5, or 5 million, or 5 trillion.

In other words, the number is useless. The real question you want to answer is “How much is this feature used?” which is less about the number of Live Bookmarks reported than it is Live Bookmarks stored per Firefox profile. If the 5 million Live Bookmarks are five thousand reports of 1000 Live Bookmarks all from one fellow named Nick, then we shouldn’t be investing in a feature used by one person, however much that one person uses it.

If the 5 million Live Bookmarks are one hundred thousand profiles reporting various handfuls of times a moderate number of bookmarks, then Live Bookmarks is more likely a broadly-used feature and might just need a little kick to be used even more.

So we need to aggregate the counts per-client and then look at the distribution. We can ask, over all the reports of Live Bookmarks from this one Firefox profile, give us the maximum number reported. Then show us a graph (like this). A perfect query of a month’s data will not only need to look at the 12GB of the month’s Live Bookmarks count, but also the profile identifier (client_id) so we can deduplicate reports. That id is a UUID and is represented as a 36-byte string. This adds another 8x data to scan compared to the 4B Live Bookmarks count we were previously looking at, ballooning our query to 108GB and our cost to $0.54.

But wait! We’re doing two steps: one to crunch these down to the 250M profiles that reported data that month and then a second to count the counts (to make our graph). That second step needs to scan the 250M 4B “maximum counts”, which adds another half a cent.

So our Best Case Estimate for querying the data to get the answer to our question is: $0.55 cents (I rounded up the half cent).

But don’t forget you need an analyst to perform this analysis! Assuming you have a mature suite of data analysis tooling, some rigorous documentation, and a well-entrenched culture of everyone helping everyone, this shouldn’t take longer than a half-hour of a single person’s time. Which is another $25, coming to a grand total of $25.55.

Deletion – Cost: $21

The data’s journey is not complete because any time a user opts their Firefox profile out of data collection we receive an order to delete what data we’ve previously received from that profile. To delete we need to copy out all the not-deleted data into new partitions and drop the old ones. This is a processing cost that is currently using the ad hoc $5/TB rate every time we process a batch of deletions (monthly).

Our Live Bookmarks count is adding 4 bytes of data per row that needs to be copied over. Each of those counts (excepting the ones that are deleted) needs to be copied over 25 times (retention period of 25 months). The amount of deleted data is small (Firefox’s data collection is very specifically designed to only collect what is necessary, so you shouldn’t ever feel as though you need to opt out and trigger deletion) so we’ll ignore its effect on the numbers for the purposes of making this easier to calculate.

12 GB/”month” x 14 “months” x 25 deletions / 1000GB/TB x 5 $/TB = $21

The total lifetime cost of all the deletion batches we process for the Live Bookmarks counts we record is $21. We’re hoping to knock this down a few pegs in cost, but it’ll probably remain in the “some dollars” order of magnitude.

The bigger share of this cost is actually in Storage, above. If we didn’t have to delete our data then, after 90 days, storage costs drop by half per month. This means that, if you want to assign the dollars a little more like blame, Storage costs are “only” $52.08 (full price for 3 months, half for 22) and Deletion costs are $52.92.

Grand Total: $245.67

In the best case, a collection of a single number from the wide Firefox user base will cost Mozilla almost $246 over the collection’s lifetime, split about 50% between labour and data computing platform costs.

So that’s it? Call it a wrap? Well… no. There are some cautionary tales to be learned here.

Lessons

0) Lean Data Practices save money. Our Data Collection Review Request form ensures that we aren’t adding these costs to Mozilla and our users without justifying that the collection is necessary. These practices were put into place to protect our users’ privacy, but they do an equally good job of reducing costs.

1) The simplest permanent data collection costs $228 its first year and $103 every year afterwards even if you never look at it again. It costs $25 (30min) to expire a collection, which pays for itself in a maximum of 2.9 months (the payback period is much shorter if the data collection is bigger than 4B (like a Histogram) because the yearly costs are higher). The best time to have expired that collection was ages ago: the second-best time is now.

2) Spending extra time thinking about a data collection saves you time and money. Even if you uplift a quick expiry patch for a mis-measured collection, the nature of Firefox releases is such that you would still end up paying nearly all of the same $245.67 for a useless collection as you would for a correct one. Spend the time ahead of time to save the expense. Especially for permanent collections.

3) Even small improvements in documentation, process, and tooling will result in large savings. Half of this cost is labour, and lesson #2 is recommending you spend more time on it. Good documentation enables good decisions to be made confidently. Process protects you from collecting the wrong thing. Tooling catches mistakes before they make their way out into the wild. Even small things like consistent naming and language will save time and protect you from mistakes. These are your force multipliers.

4) To reduce costs, efficient data representations matter, and quickly-expiring data collections matter more.

5) Retention periods should be set as short as possible. You shouldn’t have to store Live Bookmarks counts from 2+ years ago.

Where Does Glean Fit In

Glean‘s focus on high-level metric types, end-to-end-testable data collections, and consistent naming makes mistakes in instrumentation easier to find during development. Rather than waiting for instrumentation code to reach release before realizing it isn’t correct, Glean is designed to help you catch those errors earlier.

Also, Glean’s use of per-application identifiers and emphasis on custom pings allows for data segregation that allows for different retention periods per-application or per-feature (e.g. the “metrics” ping might not need to be retained for 25 months even if the “baseline” ping does. And Firefox Desktop’s retention periods could be configured to be of a different length than Firefox Lockwise‘s) and reduces data scanned per analysis. And a consistent ping format and continued involvement of Data Science through design and development reduces analyst labour costs.

Basically the only thing we didn’t address was efficient data transfer encodings, and since Glean controls its ping format as an internal detail (unlike Telemetry) we could decide to address that later on without troubling Product Developers or Data Science.

There’s no doubt more we could do (and if you come up with something, do let us know!), but already I’m confident Glean will be worth its weight in Canadian Dollars.

:chutten

(( Special thanks to :jason and :mreid for helping me nail down costs for the pipeline pieces and for the broader audience of Data Engineers, Data Scientists, Telemetry Engineers, and other folks who reviewed the draft. ))

Distributed Teams: Not Just Working From Home

Technology companies taking curve-flattening exercises of late has resulted in me digging up my old 2017 talk about working as and working with remote employees. Though all of the advice in it holds up even these three years later, surprisingly little of it seemed all that relevant to the newly-working-from-home (WFH) multitudes.

Thinking about it, I reasoned that it’s because the talk (slides are here if you want ’em) is actually more about working on a distributed team than working from home. Though it contained the usual WFH gems of “have a commute”, “connect with people”, “overcommunicate”, etc etc (things that others have explained much better than I ever will); it also spent a significant amount of its time talking about things that are only relevant if your team isn’t working in the same place.

Aspects of distributed work that are unique not to my not being in the office but my being on a distributed team are things like timezones, cultural differences, personal schedules, presentation, watercooler chats, identity… things that you don’t have to think about or spend effort on if you work in the same place (and, not coincidentally, things I’ve written about in the past). If we’re all in Toronto you know not only that 12cm of snow fell since last night but also what that does to the city in the morning. If we’re all in Italy you know not to schedule any work in August. If we see each other all the time then I can use a picture I took of a glacier in Iceland for my avatar instead of using it as a rare opportunity to be able to show you my face.

So as much as I was hoping that all this sudden interest in WFH was going to result in a sea change in how working on a distributed team is viewed and operates, I’m coming to the conclusion that things probably will not change. Maybe we’ll get some better tools… but none that know anything about being on a distributed team (like how “working hours” aren’t always contiguous (looking at you, Google Calendar)).

At least maybe people will stop making the same seven jokes about how WFH means you’re not actually working.

:chutten

Jira, Bugzilla, and Tales of Issue Trackers Past

It seems as though Mozilla is never not in a period of transition. The distributed nature of the organization and community means that teams and offices and any informal or formal group is its own tiny experimental plot tended by gardeners with radically different tastes.

And if there’s one thing that unites gardeners and tech workers is that both have Feelings about their tools.

Tools are personal things: they’re the only thing that allows us to express ourselves in our craft. I can’t code without an editor. I can’t prune without shears. They’re the part of our work that we actually touch. The code lives Out There, the garden is Outside… but the tools are in our hands.

But tools can also be group things. A shed is a tool for everyone’s tools. A workshop is a tool that others share. An Issue Tracker is a tool that helps us all coordinate work.

And group things require cooperation, agreement, and compromise.

While I was on the Browser team at BlackBerry I used a variety of different Issue Trackers. We started with an outdated version of FogBugz, then we had a Bugzilla fork for the WebKit porting work and MKS Integrity for everything else across the entire company, and then we all standardized on Jira.

With minimal customization, Jira and MKS Integrity both seemed to be perfectly adequate Issue Tracking Software. They had id numbers, relationships, state, attachments, comments… all the things you need in an Issue Tracker. But they couldn’t just be “perfectly adequate”, they had to be better enough than what they were replacing to warrant the switch.

In other words, to make the switch the new thing needs to do something that the previous one couldn’t, wouldn’t, or just didn’t do (or you’ve forgotten that it did). And every time Jira or MKS is customized it seems to stop being Issue Tracking Software and start being Workflow Management Software.

Perhaps because the people in charge of the customization are interested more in workflows than in Issue Tracking?

Regardless of why, once they become Workflow Management Software they become incomparable with Issue Trackers. Apples and Oranges. You end up optimizing for similar but distinct use cases as it might become more important to report about issues than it is to file and fix and close them.

And that’s the state Mozilla might be finding itself in right now as a few teams here and there try to find the best tools for their garden and settle upon Jira. Maybe they tried building workflows in Bugzilla and didn’t make it work. Maybe they were using Github Issues for a while and found it lacking. We already had multiple places to file issues, but now some of the places are Workflow Management Software.

And the rumbling has begun. And it’s no wonder, as even tools that are group things are still personal. They’re still what we touch when we craft.

The GNU-minded part of me thinks that workflow management should be built above and separate from issue tracking by the skillful use of open and performant interfaces. Bugzilla lets you query whatever you want, however you want, so why not build reporting Over There and leave me my issue tracking Here where I Like It.

The practical-minded part of me thinks that it doesn’t matter what we choose, so long as we do it deliberately and consistently.

The schedule-minded part of me notices that I should probably be filing and fixing issues rather than writing on them. And I think now’s the time to let that part win.

:chutten

This Week in Glean: A Distributed Team Echoes Distributed Workflow

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

Last Week: Extending Glean: build re-usable types for new use-cases by Alessio


I was recently struck by a realization that the position of our data org’s team members around the globe mimics the path that data flows through the Glean Ecosystem.

Glean Data takes this five-fold path (corresponding to five teams):

  1. Data is collected in a client using the Glean SDK (Glean Team)
  2. Data is transmitted to the Structured Ingestion pipeline (Data Platform)
  3. Data is stored and maintained in our infrastructure (Data Operations)
  4. Data is presented in our tools (Data Tools)
  5. Data is analyzed and reported on (Data Science)

The geographical midpoint of the Glean Team is about halfway across the north atlantic. For Data Platform it’s on the continental US, anchored by three members in the midwestern US. Data Ops is further West still, with four members in the Pacific timezone and no Europeans. Data Tools breaks the trend by being a bit further East, with fewer westcoasters. Data Science (for Firefox) is centred farther west still, with only two members East of the Rocky Mountains.

Or, graphically:

gleanEcosystemTeamCentres
(approximate) Team Geocentres

Given the rotation of the Earth, the sun rises first on the Glean Team and the data collected by the Glean SDK. Then the data and the sun move West to the Data Platform where it is ingested. Data Tools gets the data from the Platform as morning breaks over Detroit. Data Operations keeps it all running from the midwest. And finally, the West Coast Centre of Firefox Data Science Excellence greets the data from a mountaintop, to try and make sense of it all.

(( Lying orthogonal to the data organization is the secret Sixth Glean Data “Team”: Data Stewardship. They ensure all Glean Data is collected in accordance with Mozilla’s Privacy Promise. The sun never sets on the Stewardship’s global coverage, and it’s a volunteer effort supplied from eight teams (and growing!), so I’ve omitted them from this narrative. ))

Bad metaphors about sunlight aside, I wonder whether this is random or whether this is some sort of emergent behaviour.

Conway’s Law suggests that our system architecture will tend to reflect our orgchart (well, the law is a bit more explicit about “communication structure” independent of organizational structure, but in the data org’s case they’re pretty close). Maybe this is a specific example of that: data architecture as a reflection of orgchart geography.

Or perhaps five dots on a globe that are only mostly in some order is too weak of a coincidence to even bear thinking about? Nah, where’s the blog post in that thinking…

If it’s emergent, it then becomes interesting to consider the “chicken and egg” point of view: did the organization beget the system or the system beget the organization? When I joined Mozilla some of these teams didn’t exist. Some of them only kinda existed within other parts of the company. So is the situation we’re in today a formalization by us of a structure that mirrors the system we’re building, or did we build the system in this way because of the structure we already had?

mindblown.gif

:chutten

Controlling a Linux Laptop’s Internet Access

I fear the Internet. It’s powerful and full of awesome and awful things. That might be why, in my house, there are no smart devices.

This fear is now warring with my duties as a parent for ensuring my child has the best chance at succeeding in whatever she chooses to do with her life. No matter what she chooses, the Internet is likely to be a part of it because the Internet underpins everything these days. So like swimming, riding a bike, and driving a car (skills I see as necessities in Canada), she needs to learn how to handle herself on the Internet.

And like swimming, I need a shallow end. Like cycling, training wheels. Like driving, a learner’s permit. I need something to get both her and I used to the idea of the whole thing… but safer. More restrictive.

So to act as a stepping stone between always-on Full Internet Access and the “ask your parents to look something up for you”, my wife and I determined we’d repurpose an ancient and underpowered laptop (with the battery removed) as an Email and Scratch (MIT’s visual programming language that she’s using to make a Star Wars video game at the moment) machine.

I thought this’d be easy. There’s Scratch for Linux, and I could adjust the firewall to only pass IMAP (for email receipt) and SMTP (for email sending). Alas, her email is hosted by GMail, and all Google properties have standardized on OAuth2 for authentication. OAuth2 means HTTP requests. Letting HTTP access into the Laptop opens it up to always-on Full Internet Access, so what to do, what to do.

Luckily there is precisely one Google OAuth2 server with a well-known name that we need to reach over HTTP. Unluckily it can be at an arbitrary number of different IP addresses. Luckily I just learned how to use `dnsmasq` to configure how name resolution works on the laptop.

So this is how to adjust Linux Mint 19 to allow DNS queries to resolve exactly the servers I want to allow for email… and no others.

1) Tell `NetworkManager` (the network manager) to use its `dnsmasq` plugin by editing `/etc/NetworkManager/conf.d/00-use-dnsmasq.conf` to contain

[main]
dns=dnsmasq

2) Configure `dnsmasq` to our specifications by editing `/etc/NetworkManager/dnsmasq.d/00-urlfilter.conf` to contain

log-queries # Log all DNS queries for debugging purposes
log-facility=/var/log/dnsmasq.log # Log things here

no-resolv # don't use resolv.conf
interface=wlp12s0 # Bind requests on this interfaced (maybe I should bind eth0 too in case she plugs it in?)
listen-address=127.0.0.53 # This is where the system's expecting to find systemd-resolve's DNS. Take it over.

address=/#/127.0.0.1 # Resolve all hosts to localhost. Excepting the below
server=/imap.gmail.com/8.8.8.8 # Use Google's DNS to resolve incoming mail server
server=/smtp.gmail.com/8.8.8.8 # Use Google's DNS to resolve outgoing mail server
server=/googleapis.com/8.8.8.8 # Use Google's DNS to resolve OAuth2 server

3) Tell `systemd-resolved` (our current DNS resolver) not to get in our way by editing `/etc/systemd/resolved.conf` to have the line

DNSStubListener=no # Don't start the local `resolved` DNS cache/server (conflicts with dnsmasq)

4) Restart `​systemd-resolved` so it stops its DNS listener

sudo systemctl restart systemd-resolved.service

5) Restart `NetworkManager` so that `dnsmasq` can take over

sudo systemctl restart NetworkManager

Caveats:

Though this works for now insofar as loading up Firefox and trying to go to example.com will fail, it doesn’t stop the laptop from accessing the Internet. If you have an IP Address in hand, you can still get to where you want to be (though most third-party-hosted resources will fail (and since that’s how trackers work on the web, maybe this is a feature not a bug)).

Also this prevents even beneficial services from running. To update packages I need to bypass the filter (by commenting out `address=/#/127.0.0.1` and putting in `server=/#/8.8.8.8`).

Also also this effectively forbids access to local services like my network-attached storage and the printer: both things that my daughter should be able to use. (( there might be exactly one `server` line I need to add to make LAN-local shortnames resolve using my router, but I haven’t looked that up yet ))

But this was enough to get it up and running (though the laptop is ancient enough that even legacy gpu drivers have dropped support for it), and that’s the important thing. Now she’s sending emails to her relatives and fielding their responses regularly… and can crack open her Star Wars videogame and do a little recreational programming on the side.

I even had her type up her Science Fair documentation on it the other day (though she still hasn’t learned how to manually Save documents, so the ancient laptop’s instability caused some friction).

We’ll see how long this lasts. Not bad for a couple hour’s work and available parts. I wonder how soon I’ll be convinced to upgrade her to more of the Internet or a less-decrepit machine.

:chutten

This Week in Glean: Glean in Private

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

In the Kotlin implementation of the Glean SDK we have a glean.private package. (( Ideally anything that was actually private in the Glean SDK would actually _be_ private and inaccessible, but in order to support our SDK magic (okay, so that the SDK could work properly by generating the Specific Metrics API in subcomponents) we needed something public that we just didn’t want anyone to use. )) For a little while this week it looked like the use of the Java keyword private in the name was going to be problematic. Here are some of the alternatives we came up with:

Fortunately (or unfortunately) :mdboom (whom I might have to start calling Dr. Boom) came up with a way to make it work with the package private intact, so we’ll never know which one we would’ve gone with.

Alas.

I guess I’ll just have to console myself with the knowledge that we’ve deployed this fix to Fenix, Python bindings are becoming a reality, and the first code supporting the FOGotype might be landing in mozilla-central. (More to come on all of that, later)

:chutten

Distributed Teams: Why I Don’t Go to the Office More Often

I was invited to a team dinner as part of a work week the Data Platform team was having in Toronto. I love working with these folks, and I like food, so I set about planning my logistics.

The plan was solid, but unimpressive. It takes three hours or so to get from my home to the Toronto office by transit, so I’d be relying on the train’s WiFi to allow me to work on the way to Toronto, and I’d be arriving home about 20min before midnight.

Here’s how it went:

  1. 0800 Begin
  2. 0816 Take the GRT city bus to Kitchener train station
  3. 0845 Try to find a way to get to the station (the pedestrian situation around the station is awful)
  4. 0855 Learn that my 0918 train is running 40min late.
  5. 0856 Purchase a PRESTO card for my return journey, being careful to not touch any of the blood stains on the vending machine. (Seriously. Someone had a Bad Time at Kitchener Station recently)
  6. 0857 Learn that they had removed WiFi from the train station, so the work I’ll be able to do is limited to what I can manage on my phone’s LTE
  7. 0900 Begin my work day (Slack and IRC only), and eat the breakfast I packed because I didn’t have time at home.
  8. 0943 Train arrives only 35min late. Goodie.
  9. 0945 Learn from the family occupying my seat that I actually bought a ticket for the wrong day. Applying a discount code didn’t keep the date and time I selected, and I didn’t notice until it was too late. Sit in a different seat and wonder what the fare inspector will think.
  10. 0950 Start working from my laptop. Fear of authority can build on its own time, I have emails to answer and bugs to shuffle.
  11. 1030 Fare inspector finally gets around to me as my nervousness peaks. Says they’ll call it in and there might be an adjustment charge to reschedule it.
  12. 1115 Well into Toronto, the fare inspector just drops my ticket into my lap on their way to somewhere else. I… guess everything’s fine?
  13. 1127 Train arrives at Toronto Union Station. Disconnect WiFi, disembark and start walking to the office. (Public transit would be slower, and I’m saving my TTC token for tonight’s trip)
  14. 1145 Arrive at MoTo just in time for lunch.

Total time to get to Mozilla Toronto: 3h45min. Total distance traveled: 95km Total cost: $26 for the Via rail ticket, $2.86 for the GRT city bus.

The way back wasn’t very much better. I had to duck out of dinner at 8pm to have a hope of getting home before the day turned into tomorrow:

  1. 2000 Leave the team dinner, say goodnights. Start walking to the subway
  2. 2012 At the TTC subway stop learn that the turnstiles don’t take tokens any more. Luckily there’s someone in the booth to take my fare.
  3. 2018 Arrive at Union station and get lost in the construction. I thought the construction was done (the construction is never done).
  4. 2025 Ask at the PRESTO counter how to use PRESTO properly. I knew it was pay-by-distance but I was taking a train _and_ a bus, so I wasn’t sure if I needed to tap in between the two modes (I do. Tap before the train, after the train, on the bus when you get on, and on the bus when you get off. Seems fragile, but whatever).
  5. 2047 Learn that the train’s been rescheduled 6min later. Looks like I can still make my bus connection in Bramalea.
  6. 2053 Tap on the thingy, walk up the flights of stairs to the train, find a seat.
  7. 2102 “Due to platform restrictions, the doors on car 3107 will not open at Bramalea”… what car am I on? There’s no way to tell from where I’m sitting.
  8. 2127 Arrive at Bramalea. I’m not on car 3107.
  9. 2130 Learn that there’s one correct way to leave the platform and I took the other one that leads to the parking lot. Retrace my steps.
  10. 2132 Tap the PRESTO on the thingy outside the station building (closed)
  11. 2135 Tap the PRESTO on the thingy inside the bus. BEEP BEEP. Bus driver says insufficient funds. That can’t be, I left myself plenty of room. Tick tock.
  12. 2136 Cold air aching in my lungs from running I load another $20 onto the PRESTO
  13. 2137 Completely out of breath, tap the PRESTO on the thingy inside the bus. Ding. Collapse in a seat. Bus pulls out just moments later.
  14. 2242 Arrive in Kitchener. Luckily the LRT, running at 30min headways, is 2min away. First good connection of the day.
  15. 2255 This is the closest the train can get me. There’s a 15min wait (5 of which I’ll have to walk in the cold to get to the stop) for a bus that’ll get me, in 7min, within a 10min walk from home. I decide to walk instead, as it’ll be faster.
  16. 2330 Arrive home.

Total time to get home: 3h30min. Total distance traveled: 103km. Total cost: $3.10 for the subway token, $46 PRESTO ($6 for the card, $20 for the fare, $20 for the surprise fare), $2.86 for the LRT.

At this point I’ve been awake for over 20 hours.

Is it worth it? Hard to say. Every time I plan one of these trips I look forward to it. Conversations with office folks, eating office lunch, absconding with office snacks… and this time I even got to go out to dinner with a bunch of data people I work with all the time!

But every time I do this, as I’m doing it, or as I’m recently back from doing it… I don’t feel great about it. It’s essentially a full work day (nearly eight full hours!) just in travel to spend 5 hours in the office, and (this time) a couple hours afterwards in a restaurant.

Ultimately this — the share of my brain I need to devote purely to logistics, the manifold ways things can go wrong, the sheer _time_ it all takes — is why I don’t go into the office more often.

And the people are the reason I do it at all.

:chutten

Four-Year Moziversary

Wowee what a year that was. And I’m pretty sure the year to come will be even more so.

We gained two new team members, Travis and Beatriz. And with Georg taking a short break, we’ve all had more to do that usual. Glean‘s really been working out well, though I’ve only had the pleasure of working on it a little bit.

Instead I’ve been adding fun new features to Firefox Desktop like Origin Telemetry. I also gave a talk at a conference about Data and Responsibility. Last December’s All Hands returned us to Orlando, and June brought me to Whistler for the first time. We held a Virtual Work Week (or “vorkweek”) a couple of weeks ago when we couldn’t find a time and the budget to meet in person, and spent it planning out how we’ll bring Glean to Firefox Desktop with Project FOG. First with a Prototype (FOGotype) by end of year. And then 2020 will be the year of Glean on the Desktop.

Blogging-wise I’ve slowed down quite a lot. 12 posts so far this calendar year is much lower than previous years’ 25+. The velocity I’d kept up by keeping tabs on the Ontario Provincial Legislature and pontificating about video games I’d played died in the face of mounting work pressures. Instead of spending my off time writing non-mozilla things I spent a lot of it reading instead (as my goodreads account can attest).

But now that I’ve written this one, maybe I’ll write more here.

Resolution for the coming year? More blogging. Continued improvement. Put Glean on Firefox. That is all.