Firefox Origin Telemetry: Putting Prio in Practice

Prio is neat. It allows us to learn counts of things that happen across the Firefox population without ever being able to learn which Firefox sent us which pieces of information.

For example, Content Blocking will soon be using this to count how often different trackers are blocked and exempted from blocking so we can more quickly roll our Enhanced Tracking Protection to our users to protect them from companies who want to track their activities across the Web.

To get from “Prio is neat” to “Content Blocking is using it” required a lot of effort and the design and implementation of a system I called Firefox Origin Telemetry.

Prio on its own has some very rough edges. It can only operate on a list of at most 2046 yes or no questions (a bit vector). It needs to know cryptographic keys from the servers that will be doing the sums and decryption. It needs to know what a “Batch ID” is. And it needs something to reliably and reasonably-frequently send the data once it has been encoded.

So how can we turn “tracker fb.com was blocked” into a bit in a bit vector into an encoded prio buffer into a network payload…

Firefox Origin Telemetry has two lists: a list of “origins” and a list of “metrics”. The list of origins is a list of where things happen. Did you block fb.com or google.com? Each of those trackers are “origins”. The list of metrics is a list of what happened. Did you block fb.com or did you have to exempt it from blocking because otherwise the site broke? Both “blocked” and “exempt” are “metrics”.

In this way Content Blocking can, whenever fb.com is blocked, call

Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Blocked, "fb.com");

And Firefox Origin Telemetry will take it from there.

Step 0 is in-memory storage. Firefox Origin Telemetry stores tables mapping from encoding id (ContentBlocking_Blocked) to tables of origins mapped to counts (“fb.com”: 1). If there’s any data in Firefox Origin Telemetry, you can view it in about:telemetry and it might look something like this:

originTelemetryAbout

Step 1 is App Encoding: turning “ContentBlocking_Blocked: {“fb.com”: 1}” into “bit twelve on shard 2 should be set to 1 for encoding ‘content-blocking-blocked’ ”

The full list of origins is too long to talk to Prio. So Firefox Origin Telemetry splits the list into 2046-element “shards”. The order of the origins list and the split locations for the shards must be stable and known ahead of time. When we change it in the future (either because Prio can start accepting larger or smaller buffers, or when the list of origins changes) we will have to change the name of the encoding from ‘content-blocking-blocked’ to maybe ‘content-blocking-blocked-v2’.

Step 2 is Prio Encoding: Firefox Origin Telemetry generates batch IDs of the encoding name suffixed with the shard number: for our example the batch ID is “content-blocking-blocked-1”. The server keys are communicated by Firefox Preferences (you can see them in about:config). With those pieces and the bit vector shards themselves, Prio has everything it needs to generate opaque binary blobs about 50 kilobytes in size.

Yeah, 2kb of data in a 50kb package. Not a small increase.

Step 3 is Base64 Encoding where we turn those 50kb binary blobs into 67kb strings of the letters a-z and A-Z, the numbers 0-9, and the symbols “+” or “/”. This is so we can send it in a normal Telemetry ping.

Step 4 is the “prio” ping. Once a day or when Firefox shuts down we need to send a ping containing these pairs of batch ids and base64-encoded strings plus a minimum amount of environmental data (Firefox version, current date, etc.), if there’s data to be sent. In the event that sending fails, we need to retry (TelemetrySend). After sending the ping should be available to be inspected for a period of time (TelemetryArchive).

…basically, this is where Telemetry does what Telemetry does best.

And then the ping becomes the problem of the servers who need to count and verify and sum and decode and… stuff. I dunno, I’m a Firefox Telemetry Engineer, not a Data Engineer. :amiyaguchi’s doing that part, not me : )

I’ve smoothed over some details here, but I hope I’ve given you an idea of what value Firefox Origin Telemetry brings to Firefox’s data collection systems. It makes Prio usable for callers like Content Blocking and establishes systems for managing the keys and batch IDs necessary for decoding on the server side (Prio will generate int vector shards for us, but how will we know which position of which shard maps back to which origin and which metric?).

Firefox Origin Telemetry is shipping in Firefox 68 and is currently only enabled for Firefox Nightly and Beta. Content Blocking is targeting Firefox 69 to start using Origin Telemetry to measure tracker blocking and exempting for 0.014% of pageloads of 1% of clients.

:chutten

Advertisements