Two Days, or How Long Until The Data Is In

Two days.

It doesn’t seem like long, but that is how long you need to wait before looking at a day’s Firefox data and being sure than 95% of it has been received.

There are some caveats, of course. This only applies to current versions of Firefox (55 and later). This will very occasionally be wrong (like, say, immediately after Labour Day when people finally get around to waking up their computers that have been sleeping for quite some time). And if you have a special case (like trying to count nearly everything instead of just 95% of it) you might want to wait a bit longer.

But for most cases: Two Days.

As part of my 2017 Q3 Deliverables I looked into how long it takes clients to send their anonymous usage statistics to us using Telemetry. This was a culmination of earlier ponderings on client delay, previous work in establishing Telemetry client health, and an eighteen-month (or more!) push to actually look at our data from a data perspective (meta-data).

This led to a meeting in San Francisco where :mreid, :kparlante, :frank, :gfritzsche, and I settled upon a list of metrics that we ought to measure to determine how healthy our Telemetry system is.

Number one on that list: latency.

It turns out there’s a delay between a user doing something (opening a tab, for instance) and them sending that information to us. This is client delay and is broken into two smaller pieces: recording delay (how long from when the user does something until when we’ve put it in a ping for transport), and submission delay (how long it takes that ready-for-transport ping to get to Mozilla).

If you want to know how many tabs were opened on Tuesday, September the 5th, 2017, you couldn’t tell on the day itself. All the tabs people open late at night won’t even be in pings, and anyone who puts their computer to sleep won’t send their pings until they wake their computer in the morning of the 6th.

This is where “Two Days” comes in: On Thursday the 7th you can be reasonably sure that we have received 95% of all pings containing data from the 5th. In fact, by the 7th, you should even have that data in some scheduled datasets like main_summary.

How do we know this? We measured it:

Screenshot-2017-9-12 Client "main" Ping Delay for Latest Version(1).png(Remember what I said about Labour Day? That’s the exceptional case on beta 56)

Most data, most days, comes in within a single day. Add a day to get it into your favourite dataset, and there you have it: Two Days.

Why is this such a big deal? Currently the only information circulating in Mozilla about how long you need to wait for data is received wisdom from a pre-Firefox-55 (pre-pingsender) world. Some teams wait up to ten full days (!!) before trusting that the data they see is complete enough to make decisions about.

This slows Mozilla down. If we are making decisions on data, our data needs to be fast and reliably so.

It just so happens that, since Firefox 55, it has been.

Now comes the hard part: communicating that it has changed and changing those long-held rules of thumb and idées fixes to adhere to our new, speedy reality.

Which brings us to this blog post. Consider this your notice that we have looked into the latency of Telemetry Data and is looks pretty darn quick these days. If you want to know about what happened on a particular day, you don’t need to wait for ten days any more.

Just Two Days. Then you can have your answers.

:chutten

(Much thanks to :gsvelto and :Dexter’s work on pingsender and using it for shutdown pings, :Dexter’s analyses on ping delay that first showed these amazing improvements, and everyone in the data teams for keeping the data flowing while I poked at SQL and rearranged words in documents.)

 

Advertisements

Apple Didn’t Kill BlackBerry

bbred_wide

It was Oracle.

And I don’t mean “an Oracle” in the allegorical way Shakespeare had it where it was MacBeth’s prophecy-fueled hubris what incited the incidents (though it is pretty easy to cast anything in the mobile space as a reimaging of the Scottish Play). I mean the company Oracle was the primary agent of the the downfall of the company then-known as Research in Motion.

And they probably didn’t even mean to do it.

To be clear: this is my theory, these are my opinions, and all of it’s based on what I can remember from nearly a decade ago.

At the end of June 2007, Apple released the iPhone in the US. It was an odd little device. It didn’t have apps or GPS or 3G (wouldn’t have any of those until July 2008), it was only available on the AT&T network (a one-year exclusivity agreement), and it didn’t have copy-paste (that took until June 2009).

Worst of all, it didn’t even run Java.

Java was incredibly important in the 2000s. It was the only language both powerful enough on the day’s mobile hardware to be useful and sandboxed enough from that hardware to be safe to run.

And the iPhone didn’t have it! In fact, in the release of the SDK agreement in 2008, Apple excluded Java (and browser engines like Firefox’s Gecko) by disallowing the running of interpreted code.

It is understandable, then, that the executives in Research in Motion weren’t too worried. The press immediately called the iPhone a BlackBerry Killer… but they’d done that for the Motorola Q9H, the Nokia E61i, and the Samsung BlackJack. (You don’t have to feel bad if you’ve never heard of them. I only know they exist because I worked for BlackBerry starting in June 2008.)

I remember a poorly-chroma-keyed presentation featuring then-CTO David Yach commanding a starship that destroyed each of these devices in turn with our phasers of device portfolio depth, photon torpedoes of enterprise connectivity, and warp factor BlackBerry OS 4.6. Clearly we could deal with this Apple upstart the way we dealt with the others: by continuing to be excellent at what we did.

Still, a new competitor is still a new competitor. Measures had to be taken.

Especially when, in November of 2007, it was pretty clear that Google had stepped into the ring with the announcement of Android.

Android was the scarier competitor. Google was a well-known software giant and they had an audacious plan to marry their software expertise (and incredible buying, hiring, and lawyering power) with chipsets, handsets, and carrier reach from dozens of companies including Qualcomm, Motorola, and T-mobile.

The Android announcements exploded across the boardrooms of RIM’s Waterloo campus.

But with competition comes opportunity.

You see, Android ran Java. Well, code written in Java could run on Android. And this meant they had the hearts and minds of every mobile developer in the then-nascent app ecosystem. All they had to do was not call it Java and they were able to enable a far more recent version of Java’s own APIs than BlackBerry was allowed and run a high-performance non-Java virtual machine called Dalvik.

BlackBerry couldn’t match this due to the terms of its license agreement, while Google didn’t even need to pay Sun Microsystems (Java’s owner) a per-device license fee.

Quickly, a plan was hatched: Project Highlander (no, I’m not joking). It was going to be the one platform for all BlackBerry devices that was going to allow us to wield the sword of the Katana filesystem (still not joking) and defeat our enemies. Yes, even the execs were dorks at RIM in early 2009.

Specifically, RIM was going to adopt a new Operating System for our mobile devices that would run Dalvik, allowing them to not only finally evolve past the evolutionary barriers Sun had refused to lift from in front of BlackBerry Java…. but to also eat Google’s lunch at the same time. No matter how much money Google poured into app development for Android, we would reap the benefit through Highlander’s Android compatibility.

By essentially joining Google in the platform war against the increasingly-worrisome growth of Apple, we would be able to punch above our weight in the US. And by not running Android, we could keep our security clearance and be sold places Google couldn’t reach.

It was to be perfect: the radio core running RIM’s low-power, high-performance Nessus OS talking over secure hardware to the real-time QNX OS atop which would be running an Android-compatible Dalvik VM managing the applications RIM’s developers had written in the language they had spent years mastering: Java. With the separation of the radio and application cores we were even planning how to cut a deal with mobile carriers to only certify the radio core so we’d be free to update the user-facing parts of the OS without having to go through their lengthy, costly, irritating process.

A pity it never happened.

RIM’s end properly began on April 20, 2009, when Oracle announced it was in agreement to purchase Sun Microsystems, maker of Java.

Oracle, it was joked, was a tech company where the size of its Legal department outstripped that of the rest of its business units… combined.

Even I, a lowly grunt working on the BlackBerry Browser, knew what was going to happen next.

After Oracle completed its acquisition of Sun it took only seven months for them to file suit against Google over Android’s use of Java.

These two events held monumental importance for Research in Motion:

Oracle had bought Sun, which meant there was now effectively zero chance of a new version of mobile Java which would allow BlackBerry Java to innovate within the terms of RIM’s license to Sun.

Oracle had sued Google, which meant RIM would be squashed like a bug under the litigant might of Sun’s new master if it tried to pave its own not-Android way to its own, modern Java-alike.

All of RIM’s application engineers had lived and breathed Java for years. And now that expertise was to be sequestered, squandered, and then removed.

While Java-based BlackBerry 6 and 7 devices continued to keep the lights on under steadily decreasing order volumes, the BlackBerry PlayBook was announced, delayed, released, and scrapped. The PlayBook was such a good example of a cautionary tale that BlackBerry 10 required an extra year of development to rewrite most of the things it got wrong.

Under that extra year of pressure-cooker development, BlackBerry 10 bristled with ideas. This was a problem. Instead of evolving with patient direction, adding innovation step-by-step, guiding users over the years from 2009 to BlackBerry 10’s release in 2013, all of the pent up ideas of user interaction, user experience paradigms, and content-first design landed in users’ laps all at once.

This led to confusion, which led to frustration, which led to devices being returned.

BlackBerry 10 couldn’t sell, and with users’ last good graces spent, the company suddenly-renamed BlackBerry just couldn’t find something it could release that consumers would want to buy.

Massed layoffs, begun during the extra year of BlackBerry 10 development with the removal of entire teams of Java developers, continued as the company tried to resize itself to the size of its market. Handset prices increased to sweeten fallen margins. Developers shuffled over to the Enterprise business unit where BlackBerry was still paying bonuses and achieving sales targets.

The millions of handsets sold and billions of dollars revenue were gone. And yet, despite finding itself beneath the footfalls of fighting giants, BlackBerry was not dead — is still not dead.

Its future may not lie with smartphones, but when I left BlackBerry in late 2015, having myself survived many layoffs and reorganizations, I left with the opinion that it does indeed have a future.

Maybe it’ll focus on its enterprise deployments and niche device releases.

Maybe it’ll find a product millions of consumers will need.

Maybe it’ll be bought by Oracle.

:chutten

Data Science is Hard: History, or It Seemed Like a Good Idea At the Time

I’m mentoring a Summer of Code project this summer about redesigning the “about:telemetry” interface that ships with each and every version of Firefox.

The minute the first student (:flyingrub) asked me “What is a parent payload and child payload?” I knew I was going to be asked a lot of questions.

To least-effectively answer these questions, I’ll blog the answers as narratives. And to start with this question, here’s how the history of a project makes it difficult to collect data from it.

In the Beginning — or, rather, in the middle of October 2015 when I was hired at Mozilla (so, at my Beginning) — there was single-process Firefox, and all was good. Users had many tabs, but one process. Users had many bookmarks, but one process. Users had many windows, but one process. All this and the web contents themselves were all sharing time within a single construct of electrons and bits and code and pixels: vying with each other for control of the filesystem, the addressable space of RAM, the network resources, and CPU scheduling.

Not satisfied with things being just “good”, we took a page from the book penned by Google Chrome and decided the time was ripe to split the browser into many processes so that a critical failure in one would not trouble the others. To begin with, because our code is venerable, we decided that we would try two processes. One of these twins would be in charge of the browser and the other would be in charge of the web contents.

This project was called Electrolysis after the mechanism by which one might split water into Hydrogen and Oxygen using electricity.

Suddenly the browser became responsive, even in the face of the worst JavaScript written by the least experienced dev at the most privileged startup in Silicon Valley. And out-of-memory errors decreased in frequency because the browser’s memory and the web contents’ memory were able to grow without interfering with each other.

Remember, our code is venerable. Remember, our code hearkens from its single-process past.

Our data-collection code was written in that single-process past. But we had two processes with input events that need to be timed to find problems. We had two processes with memory allocations that need to be examined for regressions.

So the data collection code was made aware that there could be two types of process: parent and child.

Alas, not just one child. There could be many child processes in a row if some webpage were naughty and brought down the child in its anger. So the data collection code was made aware there could be many batches of data from child processes, and one batch of data from parent processes.

The parent data was left looking like single-process data, out in the root of the data collection payload. Child processes’ data were placed in an array of childPayloads where each payload echoed the structure of the parent.

Then, not content with “good”, I had to come along in bug 1218576, a bug whose number I still have locked in my memory, for good or ill.

Firefox needs to have multiple child processes of different types, simultaneously. And many of some of those several types, also simultaneously. What was going to be a quick way to ensure that childPayloads was always of length 1 turned into a months-long exercise to put data exactly where we wanted it to be.

And so now we have childPayloads where the “weird” content child data that resists aggregation remains, and we also have payload.processes.<process type>.* where the cool and hip data lives: histograms, scalars, and keyed variants of both.

Already this approach is showing dividends as some proportions of Nightly users are getting a gpu process, and others are getting some number of content processes. The data files neatly into place with minimal intervention required.

But it means about:telemetry needs to know whether you want the parent’s “weird” data or the child’s. And which child was that, again?

And about:telemetry also needs to know whether you want the parent’s “cool” data, or the content child’s, or the gpu child’s.

So this means that within about:telemetry there are now five places where you can select what process you want. One for “weird” data, and one for each of the four kinds of “cool” data.

Sadly, that brings my storytelling to a close, having reached the present day. Hopefully after this Summer’s Code is done, this will have a happier, more-maintainable, and responsively-designed ending.

But until now, remember that “accident of history” is the answer to most questions. As such it behooves you to learn history well.

:chutten

All Aboard the Release Train!

I’m working on a dashboard (a word which here means a website that has plots on it) to show Firefox crashes per channel over time. The idea is to create a resource for Release Management to figure out if there’s something wrong with a given version of Firefox with enough lead time to then do something about it.

It’s entering its final form (awaiting further feedback from relman and others to polish it) and while poking around (let’s call it “testing”) I noticed a pattern:auroramcscrashes

Each one of those spikes happens on a “merge day” when the Aurora channel (the channel powering Firefox Developer Edition) updates to the latest code from the Nightly channel (the channel powering Firefox Nightly). From then on, only stabilizing changes are merged so that when the next merge day comes, we have a nice, stable build to promote from Aurora to the Beta channel (the channel powering Firefox Beta). Beta undergoes further stabilization on a slower schedule (a new build every week or two instead of daily) to become a candidate for the Release channel (which powers Firefox).

For what it’s worth, this is called the Train Model. It allows us to ship stable and secure code with the latest features every six-to-eight weeks to hundreds of millions of users. It’s pretty neat.

And what that picture up there shows is that it works. The improvement is especially noticeable on the Aurora branch where we go from crashing 9-10 times for every thousand hours of use to 3-4 times for every thousand hours.

Now, the number and diversity of users on the Aurora branch is limited, so when the code rides the trains to Beta, the crash rate goes up. Suddenly code that seemed stable across the Aurora userbase is exposed to previously-unseen usage patterns and configurations of hardware and software and addons. This is one of the reasons why our pre-release users are so precious to us: they provide us with the early warnings we need to stabilize our code for the wild diversity of our users as well as the wild diversity of the Web.

If you’d like to poke around at the dashboard yourself, it’s over here. Eventually it’ll get merged into telemetry.mozilla.org when it’s been reviewed and formalized.

If you have any brilliant ideas of how to improve it, or find any mistakes or bugs, please comment on the tracking bug where the discussion is currently ongoing.

If you’d like to help in an different way, consider installing Firefox Beta. It should look and act almost exactly like your current Firefox, but with some pre-release goodies that you get to use before anyone else (like how Firefox Beta 50 lets you use Reader Mode when you want to print a web page to skip all the unnecessary sidebars and ads). By doing this you are helping us ship the best Firefox we can.

:chutten

Firefox Windows XP Exit Plan

7drhiqr

Last I reported, the future of Firefox’s Windows XP support was uncertain, even given long-standing plans for its removal.

With the filing of bug 1305453 and the commensurate discussion on firefox-dev, things are now much more certain. Firefox will (pending approval) be ending support for Windows XP and Windows Vista in Firefox 53 (scheduled release date: April 18, 2017).

Well, thanks for tuning in. I guess I can wrap up these posts and…

Okay, yes, you’re right. It isn’t that simple.

First, the actual day that Windows XP and Windows Vista users will cease getting Firefox updates is actually much later than April of 2017. Instead, those users will continue to receive security updates until April of 2018 because the version of Firefox 52 they’ll be getting is an Extended Support Release.

What is Firefox Extended Support Release (ESR)? It’s a version of Firefox for enterprises and other risk-averse users that receives security (and only security) updates for one year after initial release. This allows these change-weary users to still chose Firefox without having to consider how to support a major version release every six-to-eight weeks.

Windows XP and Vista users will be shunted from the normal roughly-six-weeks-per-version “Release” channel to the “ESR” channel for 52. New installs on Windows XP and Vista at that time will also be for ESR 52. This should ensure that our decreasing Windows XP+Vista userbase will be supported until they’ve finished diminishing into…

…well, okay that’s not simple either. In absolute terms, our Windows XP userbase has actually increased over the past six months or so. Some if not all of this is the end of the well-documented slump we see in user population over the Northern-hemisphere Summer (we’re now coming back up to Fall-Winter-Spring numbers). It is also possible that we’ve seen some ex-Chrome users fleeing Google’s drop of support from earlier this year.

Deseasonalized numbers for just WinXP users are hard to come by, so this is fairly speculative. One thing that’s for certain is that the diminishing Windows XP userbase trend I had previously observed (and was counting on seeing continue) is no longer in evidence.

So what happens if we reach April of 2018 and we still have millions and millions of Windows XP users still relying on Firefox to provide them with a safe way to navigate the increasingly-hostile environment of the Web?

No idea. I guess this means I’ll be continuing to blog about WinXP for a couple years yet.

:chutten

Mozilla, Firefox, and Windows XP Support

windowsXPStartButton
Used with permission from Microsoft.

Last time I focused on what the Firefox Windows XP user population appeared to be. This time, I’m going to look into what such a large population means to Firefox and Mozilla.

Windows XP users of Firefox are geographically and linguistically diverse, and make up more than one tenth of the entire Firefox user population. Which is great, right? A large, diverse population of users… other open source projects only wish they had the luck.

Except Windows XP is past its end-of-life. Nearly two years past. This means it hasn’t been updated, maintained, serviced, or secured in longer than it takes Mars to make it around the Sun.

The Internet can be a scary place. There are people who want to steal your banking passwords, post your private pictures online, or otherwise crack open your computer and take and do what they want with what’s inside of it.

Generally, this is an arms race. Every time someone discovers a software vulnerability, software vendors rush to fix it before the Bad Guys can use it to exploit people’s computers.

The reason we feel safe enough to continue our modern life using computers for our banking, shopping, and communicating is because software vendors are typically better at this than the Bad Guys.

But what if you’re using Windows XP? Microsoft, the only software vendor who is permitted to fix vulnerabilities in Windows XP, has stopped fixing them.

This means each Windows XP vulnerability that is found remains exploitable. Forever.

These are just a few vulnerabilities that we know about.

And Windows XP isn’t just bad for Windows XP users.

There are a variety of crimes that can be committed only using large networks of “robot” machines (called “botnets“) under the control of a single Bad Guy. Machines can be recruited into botnets against their users’ will through security vulnerabilities in the software they are running. Windows XP’s popularity and lengthening list of known vulnerabilities might make it an excellent source of recruits.

With enough members, a botnet can then send spam emails in sufficient volume to overload mail servers, attack financial institutions, steal information from governmental agencies, and otherwise make the Internet a less nice place to be.

So Firefox has a large, diverse population of users whose very presence connected to the Internet is damaging the Web for us all.

And so does Google! At least for now. Google has announced that it will end Windows XP support for its Chrome browser in April 2016. (It previously announced end-of-life dates for April 2015, and then December 2015.)

So, as of April, Windows XP users will have only one choice for updated, serviced, maintained, and secured web browsing: Firefox.

Which puts Mozilla in a bit of a bind. The Internet is a global, public resource that Mozilla is committed to defend and improve.

Does improving the Internet mean dropping support for Windows XP so that users have no choice but to upgrade to be able to browse safely?

Or does improving the Internet mean continuing to support Windows XP so that those can at least still have a safe browser to access the Web?

Windows XP users might not have a choice in what Operating System their computers run. They might only be using it because they don’t know of an alternative or because they can’t afford to, aren’t allowed to, or are afraid to change.

Firefox is their best hope for security on the Web. And, after April, their only hope.

As of this writing, Firefox supports versions of Windows from XP SP2 on upwards. And this is likely to continue: the latest public discussion about Windows XP support was from last December, reacting to the latest of Google’s Windows XP support blog posts.

I can reiterate confidently: Firefox will continue to support Windows XP.

For now.

Mozilla will have to find a way to reconcile this with its mission. And Firefox will have to help.

Maybe Mozillians from around the world can seek out Windows XP users and put them in contact with local operations that can donate hardware or software or even just their time to help these users connect to the Internet safely and securely.

Maybe Firefox will start testing nastygrams to pop up at our Windows XP user base when they start their next browsing session: “Did you know that your operating system is older than many Mozillians?”, “It appears as though you are accessing the Internet using a close relative of the abacus”, “We have determined that this is the email address of your closest Linux User Group who can help you secure your computer”

…and you. Yeah, you. Do you know someone who might be running Windows XP? Maybe it’s your brother, or your Mother, or your Babbi. If you see a computer with a button that says “Start” at the bottom-left corner of the screen: you can fix that. There are resources in your community that can help.

Talk to your librarian, talk to the High School computer teacher, talk to a Mozillian! We’re friendly!

Together, we can save the Web.

:chutten