Google I/O Extended 2019 – Report

I attended a Google I/O Extended event on Tuesday at Google’s Kitchener office. It’s a get-together where there are demos, talks, workshops, and networking opportunities centred around watching the keynote live on the screen.

I treat it as an opportunity to keep an eye on what they’re up to this time, and a reminder that I know absolutely no one in the tech scene around here.

The first part of the day was a workshop about how to build Actions for the Google Assistant. I found the exercise to be very interesting.

The writing of the Action itself wasn’t interesting, that was a bunch of whatever. But it was interesting that it refused to work unless you connected it to a Google Account that had Web & Search Activity tracking turned on. Also I found it interesting that, though they said it required Chrome, it worked just fine on Firefox. It was interesting listening to laptops (including mine) across the room belt out welcome phrases because the simulator defaults to a hot mic and a loud speaker. It was interesting to notice that the presenter spent thirty seconds talking about how to name your project, and zero seconds talking about the Terms of Use of the application we were being invited to use. It was interesting to see that the settings defaulted to allowing you to test on all devices registered to the Google Account, without asking.

After the workshop the tech head of the Google Home App stood up and delivered a talk about trying to get manufacturers to agree on how to talk to Google Home and the Google Assistant.

I asked whether these efforts in trying to normalize APIs and protocols was leading them to publish a standard with a standards body. “No idea, sorry.”

Then I noticed the questions from the crowd were following a theme: “Can we get finer privacy controls?” (The answer seemed to be that Google believes the controls are already fine enough) “How do you educate users about the duration the data is retained?” (It’s in the Terms of Service, but it isn’t read aloud. But Google logs every “consent moment” and keeps track of settings) “For the GDPR was there a challenge operating in multiple countries?” (Yes. They admitted that some of the “fine enough” privacy controls are finer in certain jurisdictions due to regs.) And, after the keynote, someone in the crowd asked what features Android might adopt (self-destruct buttons, maybe) to protect against Border Security-style threats.

It was very heartening to hear a room full of tech nerds from Toronto and Waterloo Region ask questions about Privacy and Security of a tech giant. It was incredibly validating to hear from the keynote that Chrome is considering privacy protections Firefox introduced last year.

Maybe we at Mozilla aren’t crazy to think that privacy is important, that users care about it, that it is at risk and big tech companies have the power and the responsibility to protect it.

Maybe. Maybe not.

Just keep those questions coming.

:chutten

Advertisements

Firefox Origin Telemetry: Putting Prio in Practice

Prio is neat. It allows us to learn counts of things that happen across the Firefox population without ever being able to learn which Firefox sent us which pieces of information.

For example, Content Blocking will soon be using this to count how often different trackers are blocked and exempted from blocking so we can more quickly roll our Enhanced Tracking Protection to our users to protect them from companies who want to track their activities across the Web.

To get from “Prio is neat” to “Content Blocking is using it” required a lot of effort and the design and implementation of a system I called Firefox Origin Telemetry.

Prio on its own has some very rough edges. It can only operate on a list of at most 2046 yes or no questions (a bit vector). It needs to know cryptographic keys from the servers that will be doing the sums and decryption. It needs to know what a “Batch ID” is. And it needs something to reliably and reasonably-frequently send the data once it has been encoded.

So how can we turn “tracker fb.com was blocked” into a bit in a bit vector into an encoded prio buffer into a network payload…

Firefox Origin Telemetry has two lists: a list of “origins” and a list of “metrics”. The list of origins is a list of where things happen. Did you block fb.com or google.com? Each of those trackers are “origins”. The list of metrics is a list of what happened. Did you block fb.com or did you have to exempt it from blocking because otherwise the site broke? Both “blocked” and “exempt” are “metrics”.

In this way Content Blocking can, whenever fb.com is blocked, call

Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Blocked, "fb.com");

And Firefox Origin Telemetry will take it from there.

Step 0 is in-memory storage. Firefox Origin Telemetry stores tables mapping from encoding id (ContentBlocking_Blocked) to tables of origins mapped to counts (“fb.com”: 1). If there’s any data in Firefox Origin Telemetry, you can view it in about:telemetry and it might look something like this:

originTelemetryAbout

Step 1 is App Encoding: turning “ContentBlocking_Blocked: {“fb.com”: 1}” into “bit twelve on shard 2 should be set to 1 for encoding ‘content-blocking-blocked’ ”

The full list of origins is too long to talk to Prio. So Firefox Origin Telemetry splits the list into 2046-element “shards”. The order of the origins list and the split locations for the shards must be stable and known ahead of time. When we change it in the future (either because Prio can start accepting larger or smaller buffers, or when the list of origins changes) we will have to change the name of the encoding from ‘content-blocking-blocked’ to maybe ‘content-blocking-blocked-v2’.

Step 2 is Prio Encoding: Firefox Origin Telemetry generates batch IDs of the encoding name suffixed with the shard number: for our example the batch ID is “content-blocking-blocked-1”. The server keys are communicated by Firefox Preferences (you can see them in about:config). With those pieces and the bit vector shards themselves, Prio has everything it needs to generate opaque binary blobs about 50 kilobytes in size.

Yeah, 2kb of data in a 50kb package. Not a small increase.

Step 3 is Base64 Encoding where we turn those 50kb binary blobs into 67kb strings of the letters a-z and A-Z, the numbers 0-9, and the symbols “+” or “/”. This is so we can send it in a normal Telemetry ping.

Step 4 is the “prio” ping. Once a day or when Firefox shuts down we need to send a ping containing these pairs of batch ids and base64-encoded strings plus a minimum amount of environmental data (Firefox version, current date, etc.), if there’s data to be sent. In the event that sending fails, we need to retry (TelemetrySend). After sending the ping should be available to be inspected for a period of time (TelemetryArchive).

…basically, this is where Telemetry does what Telemetry does best.

And then the ping becomes the problem of the servers who need to count and verify and sum and decode and… stuff. I dunno, I’m a Firefox Telemetry Engineer, not a Data Engineer. :amiyaguchi’s doing that part, not me : )

I’ve smoothed over some details here, but I hope I’ve given you an idea of what value Firefox Origin Telemetry brings to Firefox’s data collection systems. It makes Prio usable for callers like Content Blocking and establishes systems for managing the keys and batch IDs necessary for decoding on the server side (Prio will generate int vector shards for us, but how will we know which position of which shard maps back to which origin and which metric?).

Firefox Origin Telemetry is shipping in Firefox 68 and is currently only enabled for Firefox Nightly and Beta. Content Blocking is targeting Firefox 69 to start using Origin Telemetry to measure tracker blocking and exempting for 0.014% of pageloads of 1% of clients.

:chutten