Data Science is Hard: Validating Data for Glean

Glean is a new library for collecting data in Mozilla products. It’s been shipping in Firefox Preview for a little while and I’d like to take a minute to talk about how I validated that it sends what we think it’s sending.

Validating new data collections in an existing system like Firefox Desktop Telemetry is a game of comparing against things we already know. We know that some percentage of data we receive is just garbage: bad dates, malformed records, attempts at buffer overflows and SQL injection. If the amount of garbage in the new collection is within the same overall amount of garbage we see normally, we count it as “good enough” and move on.

With new data collection from a new system like Glean coming from new endpoints like the reference browser and Firefox Preview, we’re given an opportunity to compare against the ideal. Maybe the correct number of failures is 0?

But what is a failure? What is acceptable behaviour?

We have an “events” ping in Glean: can the amount of time covered by the events’ timestamps ever exceed the amount of time covered by the ping? I didn’t think so, but apparently it’s an expected outcome when the events were restored from a previous session.

So how do you validate something that has unknown error states?

I started with a list of things any client-based network-transmitted data collection system had to have:

  • How many pings (data transmissions) are there?
  • How many measurements are in those pings?
  • How many clients are sending these pings?
  • How often?
  • How long do they take to get to our servers?
  • How many poorly-structured pings are sent? By how many clients? How often?
  • How many pings with bad values are sent? By how many clients? How often?

From there we can dive into validating specifics about the data collections:

  • Do the events in the “events” ping have timestamps with reasonable separations? (What does reasonable mean here? Well, it could be anything, but if the span between two timestamps is measured in years, and the app has only been available for some number of weeks, it’s probably broken.)
  • Are the GUIDs in the pings actually globally unique? Are we seeing duplicates? (We are, but not many)
  • Are there other ping fields that should be unique, but aren’t? (For Glean no client should ever send the same ping type with the same sequence number. But that kind of duplicate appears, too)

Once we can establish confidence in the overall health of the data reporting mechanism we can start using it to report errors about itself:

  • Ping transmission should be quick (because they’re small). Assuming the ping transmission time is 0, how far away are the clients’ clocks from the server’s clock? (AKA “Clock skew”. Turns out that mobile clients’ clocks are more reliable than desktop clients’ clocks (at least in the prerelease population. We’ll see what happens when we start measuring non-beta users))
  • How many errors are reported by internal error-reporting metrics? How many send failures? How many times did the app try to record a string that was too long?
  • What measurements are in the ping? Are they only the ones we expect to see? Are they showing in reasonable proportions relative to each other and the number of clients and pings reporting them?

All these attempts to determine what is reasonable and what is correct depend on a strong foundation of documentation. I can read the code that collects the data and sends the pings… but that tells me what is correct relative to what is implemented, not what is correct relative to what is intended.

By validating to the documentation, to what is intended, we can not only find bugs in the code, we can find bugs in the docs. And a data collection system lives and dies on its documentation: it is in many ways a more important part of the “product” than the code.

At this point, aside from the “metrics” ping which is awaiting validation after some fixes reach saturation in the population, Glean has passed all of these criteria acceptably. It still has a bit of a duplicate ping problem, but its clock skew and latency are much lower than Firefox Desktop’s. There are some outrageous clients sending dozens of pings over a period that they should be sending a handful, but that might just be a test client whose values will disappear into the noise when the user population grows.

:chutten

Advertisements

Virtual Private Social Network: Tales of a BBM Exodus

bbmTimeToSayGoodbye

On Thursday April 18, my primary mechanism for talking to friends notified me that it was going away. I’d been using BlackBerry Messenger (BBM) since I started work at Research in Motion in 2008 and had found it to be tolerably built. It messaged people instantly over any data connection I had access to, what more could I ask for?

The most important BBM feature in my circle of contacts was its Groups feature. A bunch of people with BBM could form a Group and then messages, video, pictures, lists were all shared amongst the people in the group.

Essentially it acted as a virtual private social network. I could talk to a broad group of friends about the next time were getting together or about some cute thing my daughter did. I could talk to the subset who lived in Waterloo about Waterloo activities, and whose turn it was for Sunday Dinner. The Beers group kept track of whose turn it was to pay, and it combined nicely with the chat for random nerdy tidbits and coordinating when each of us arrived at the pub. Even my in-laws had a group to coordinate visits, brag about child developmental milestones, and manage Christmas.

And then BBM announced it was going away, giving users six weeks to find a replacement… or, as seemed more likely to me, replacements.

First thing I did, since the notice came during working hours, was mutter angrily that Mozilla didn’t have an Instant Messaging product that I could, by default, trust. (We do have a messaging product, but it’s only for Desktop and has an email focus.)

The second thing I did was survey the available IM apps, cross-correlating them with whether or not various of my BBM contacts already had it installed… the existing landscape seemed to be a mess. I found that WhatsApp was by far the most popular but was bought by Facebook in 2014 and required a real phone number for your account. Signal’s the only one with a privacy/security story that I and others could trust (Telegram has some weight here, but not much) but it, too, required a phone number in order to sign up. Slack’s something only my tech friends used, and their privacy policy was a shambles. Discord’s something only my gaming friends used, and was basically Slack with push-to-talk.

So we fragmented. My extended friend network went to Google Hangouts, since just about everyone already had a Google Account anyway (even if they didn’t use it for anything). The Beers group went to Discord because a plurality of the group already had it installed.

And my in-laws’ family group… well, we still have two weeks left to figure that one out. Last I heard someone was stumping for Facebook Messenger, to which I replied “Could we not?”

The lack of reasonable options and the (sad, understandable) willingness of my relatives to trade privacy for convenience is bothering me so much that I’ve started thinking about writing my own IM/virtual private social network.

You’d think I’d know better than to even think about architecting anything even close to either of those topics… but the more I think about it the more webtech seems like an ideal fit for this. Notifications, Push, ServiceWorkers, WebRTC peer connections, TLS, WebSockets, OAuth: stir lightly et voila.

But even ignoring the massive mistake diving into either of those ponds full of crazy would be, the time was too short for that four weeks ago, and is trebly so now. I might as well make my peace that Facebook will learn my mobile phone number and connect it indelibly with its picture of what advertisements it thinks I would be most receptive to.

Yay.

:chutten

Google I/O Extended 2019 – Report

I attended a Google I/O Extended event on Tuesday at Google’s Kitchener office. It’s a get-together where there are demos, talks, workshops, and networking opportunities centred around watching the keynote live on the screen.

I treat it as an opportunity to keep an eye on what they’re up to this time, and a reminder that I know absolutely no one in the tech scene around here.

The first part of the day was a workshop about how to build Actions for the Google Assistant. I found the exercise to be very interesting.

The writing of the Action itself wasn’t interesting, that was a bunch of whatever. But it was interesting that it refused to work unless you connected it to a Google Account that had Web & Search Activity tracking turned on. Also I found it interesting that, though they said it required Chrome, it worked just fine on Firefox. It was interesting listening to laptops (including mine) across the room belt out welcome phrases because the simulator defaults to a hot mic and a loud speaker. It was interesting to notice that the presenter spent thirty seconds talking about how to name your project, and zero seconds talking about the Terms of Use of the application we were being invited to use. It was interesting to see that the settings defaulted to allowing you to test on all devices registered to the Google Account, without asking.

After the workshop the tech head of the Google Home App stood up and delivered a talk about trying to get manufacturers to agree on how to talk to Google Home and the Google Assistant.

I asked whether these efforts in trying to normalize APIs and protocols was leading them to publish a standard with a standards body. “No idea, sorry.”

Then I noticed the questions from the crowd were following a theme: “Can we get finer privacy controls?” (The answer seemed to be that Google believes the controls are already fine enough) “How do you educate users about the duration the data is retained?” (It’s in the Terms of Service, but it isn’t read aloud. But Google logs every “consent moment” and keeps track of settings) “For the GDPR was there a challenge operating in multiple countries?” (Yes. They admitted that some of the “fine enough” privacy controls are finer in certain jurisdictions due to regs.) And, after the keynote, someone in the crowd asked what features Android might adopt (self-destruct buttons, maybe) to protect against Border Security-style threats.

It was very heartening to hear a room full of tech nerds from Toronto and Waterloo Region ask questions about Privacy and Security of a tech giant. It was incredibly validating to hear from the keynote that Chrome is considering privacy protections Firefox introduced last year.

Maybe we at Mozilla aren’t crazy to think that privacy is important, that users care about it, that it is at risk and big tech companies have the power and the responsibility to protect it.

Maybe. Maybe not.

Just keep those questions coming.

:chutten

Firefox Origin Telemetry: Putting Prio in Practice

Prio is neat. It allows us to learn counts of things that happen across the Firefox population without ever being able to learn which Firefox sent us which pieces of information.

For example, Content Blocking will soon be using this to count how often different trackers are blocked and exempted from blocking so we can more quickly roll our Enhanced Tracking Protection to our users to protect them from companies who want to track their activities across the Web.

To get from “Prio is neat” to “Content Blocking is using it” required a lot of effort and the design and implementation of a system I called Firefox Origin Telemetry.

Prio on its own has some very rough edges. It can only operate on a list of at most 2046 yes or no questions (a bit vector). It needs to know cryptographic keys from the servers that will be doing the sums and decryption. It needs to know what a “Batch ID” is. And it needs something to reliably and reasonably-frequently send the data once it has been encoded.

So how can we turn “tracker fb.com was blocked” into a bit in a bit vector into an encoded prio buffer into a network payload…

Firefox Origin Telemetry has two lists: a list of “origins” and a list of “metrics”. The list of origins is a list of where things happen. Did you block fb.com or google.com? Each of those trackers are “origins”. The list of metrics is a list of what happened. Did you block fb.com or did you have to exempt it from blocking because otherwise the site broke? Both “blocked” and “exempt” are “metrics”.

In this way Content Blocking can, whenever fb.com is blocked, call

Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Blocked, "fb.com");

And Firefox Origin Telemetry will take it from there.

Step 0 is in-memory storage. Firefox Origin Telemetry stores tables mapping from encoding id (ContentBlocking_Blocked) to tables of origins mapped to counts (“fb.com”: 1). If there’s any data in Firefox Origin Telemetry, you can view it in about:telemetry and it might look something like this:

originTelemetryAbout

Step 1 is App Encoding: turning “ContentBlocking_Blocked: {“fb.com”: 1}” into “bit twelve on shard 2 should be set to 1 for encoding ‘content-blocking-blocked’ ”

The full list of origins is too long to talk to Prio. So Firefox Origin Telemetry splits the list into 2046-element “shards”. The order of the origins list and the split locations for the shards must be stable and known ahead of time. When we change it in the future (either because Prio can start accepting larger or smaller buffers, or when the list of origins changes) we will have to change the name of the encoding from ‘content-blocking-blocked’ to maybe ‘content-blocking-blocked-v2’.

Step 2 is Prio Encoding: Firefox Origin Telemetry generates batch IDs of the encoding name suffixed with the shard number: for our example the batch ID is “content-blocking-blocked-1”. The server keys are communicated by Firefox Preferences (you can see them in about:config). With those pieces and the bit vector shards themselves, Prio has everything it needs to generate opaque binary blobs about 50 kilobytes in size.

Yeah, 2kb of data in a 50kb package. Not a small increase.

Step 3 is Base64 Encoding where we turn those 50kb binary blobs into 67kb strings of the letters a-z and A-Z, the numbers 0-9, and the symbols “+” or “/”. This is so we can send it in a normal Telemetry ping.

Step 4 is the “prio” ping. Once a day or when Firefox shuts down we need to send a ping containing these pairs of batch ids and base64-encoded strings plus a minimum amount of environmental data (Firefox version, current date, etc.), if there’s data to be sent. In the event that sending fails, we need to retry (TelemetrySend). After sending the ping should be available to be inspected for a period of time (TelemetryArchive).

…basically, this is where Telemetry does what Telemetry does best.

And then the ping becomes the problem of the servers who need to count and verify and sum and decode and… stuff. I dunno, I’m a Firefox Telemetry Engineer, not a Data Engineer. :amiyaguchi’s doing that part, not me : )

I’ve smoothed over some details here, but I hope I’ve given you an idea of what value Firefox Origin Telemetry brings to Firefox’s data collection systems. It makes Prio usable for callers like Content Blocking and establishes systems for managing the keys and batch IDs necessary for decoding on the server side (Prio will generate int vector shards for us, but how will we know which position of which shard maps back to which origin and which metric?).

Firefox Origin Telemetry is shipping in Firefox 68 and is currently only enabled for Firefox Nightly and Beta. Content Blocking is targeting Firefox 69 to start using Origin Telemetry to measure tracker blocking and exempting for 0.014% of pageloads of 1% of clients.

:chutten

Distributed Teams: A Test Failing Because It’s Run West of Newfoundland and Labrador

(( Not quite 500 mile email-level of nonsense, but might be the closest I get. ))

A test was failing.

Not really unusual, that. Tests fail all the time. It’s how we know they’re good tests: protecting us developers from ourselves.

But this one was failing unusually. Y’see, it was failing on my machine.

(Yes, har har, it is a common-enough occurrence given my obvious lack of quality as a developer how did you guess.)

The unusual part was that it was failing only for me… and I hadn’t even touched anything yet. It wasn’t failing on our test infrastructure “try”, and it wasn’t failing on the machine of :raphael, the fellow responsible for the integration test harness itself. We were building Firefox the same way, running telemetry-tests-client the same way… but I was getting different results.

I fairly quickly locked down the problem to be an extra “main” ping with reason “environment-change” being sent during the initial phases of the test. By dumping some logging into Firefox, rebuilding it, and then routing its logs to console with --gecko-log "-" I learned that we were sending a ping because a monitored user preference had been changed: browser.search.region.

When Firefox starts up the first time, it doesn’t know where it is. And it needs to know where it is to properly make a first guess at what language you want and what search engines would work best. Google’s results are pretty bad in Canada unless you use “google.ca”, after all.

But while Firefox doesn’t know where it is, it does know is what timezone it’s in from the settings in your OS’s clock. On top of that it knows what language your OS is set to. So we make a first guess at which search region we’re in based on whether or not the timezone overlaps a US timezone and if your OS’ locale is `en-US` (United States English).

What this fails to take into account is that United States English is the “default” locale reported by many OSes even if you aren’t in the US. And how even if you are in a timezone that overlaps with the US, you might not be there.

So to account for that, Mozilla operates a location service to double-check that the search region is appropriately set. This takes a little time to get back with the correct information, if it gets back to us at all. So if you happen to be in a US-overlapping timezone with an English-language OS Firefox assumes you’re in the US. Then if the location service request gets back with something that isn’t “US”, browser.search.region has to be updated.

And when it updates, it changes the Telemetry Environment.

And when the Environment changes, we send a “main” ping.

And when we send a “main” ping, the test breaks.

…all because my timezone overlaps the OS and my language is “Default” English.

I feel a bit persecuted, but this shows another strength of Distributed Teams. No one else on my team would be able to find this failure. They’re in Germany, Italy, and the US. None of them have that combination of “Not in the US, but in a US timezone” needed to manifest the bug.

So on one hand this sucks. I’m going to have to find a way around this.

But on the other hand… I feel like my Canadianness is a bit of a bug-finding superpower. I’m no Brok Windsor or Captain Canuck, but I can get the job done in a way no one else on my team can.

Not too bad, eh?

:chutten

So I’ve Finished: The Talos Principle

I know for a fact that if the narrative told me exactly what the inciting incident was it would lessen the experience. And yet.

Maybe Horizon: Zero Dawn taught me to expect every mystery to be satisfactorily explained with accompanying voice acting and emotional score. And maybe it’s the same instinct that saw me reading every then-published Honor Harrington novel in a row one summer, and perusing the Trivia section of IMDB for every single thing I watch.

I like knowing things.

I like the feeling of besting the puzzles in the Talos Principle. I like the Portal-esque “break free from the constraints of the system!” story, done one better in this game by having you actually break the game (but not actually breaking it because there are collectibles to collect by doing so). There’s wonderful mechanics-informing-story stuff here. And by pointing out explicitly the game that you’re playing makes for some lovely “where does the game end?” “who is really playing this?” navel gazing nonsense that I just adore.

The story is the obvious one. And the way it is told through journals and audio recordings and crawled fragments of the web holds together so well. There’s only the one thing they keep you from learning (as far as I can tell)

And I know not knowing makes the Talos Principle better. I know ambiguity can be a deliberate choice, the better choice, the only choice.

And yet.

Eulogy for a 13-Year-Old Display

goodbyeOldMonitorI was working for the Department of National Defence in Canada (specifically Defence Research and Development Canada) in early 2005 when I first plugged in my new xplio CM998 monitor. It was amazing.

Not only was it one of those new lightweight LCD monitors (I have since owned desks that weigh less), it supported resolutions up to 1280×1024 pixels natively and had both DVI and VGA ports!

It also generated enough heat in my basement apartment that I could notice it from across the room, but that was a plus in that cold Scarborough winter.

From there I moved it to an apartment. Another apartment. A home. And then another home. And then, finally, when I had stopped using it at home I started using it at work for Mozilla.

I liked its comfortable 5:4 aspect ratio, and the fact it wouldn’t wobble when I got up to get coffee.

On Friday it wouldn’t turn on. Well, it did turn on. Linux was assigning it desktop space, knew who it was and how big it was… but it wouldn’t display anything.

I would have liked to turn it off and on again, but the power switch hasn’t worked reliably since my daughter was born. So I did the next best thing and unplugged it and plugged it back in. It would display my Firefox wallpaper for just long enough for some capacitor to warm up or something, and then it would black out.

Nothing I could do would resuscitate it. No cable swaps, no buttons I could press, no whining or cajoling.

Here ends the 13-year service life of my venerable SXGA display.

Your service did not go unnoticed. Enjoy your recycling.

:chutten