Launch day here was quite a day, and I thought I would describe my view of it.
We knew it would be a long day, since the launch was schedule for 1 a.m. Monday night/Tuesday morning. On Monday most of us started rolling into the office about 10 a.m. or so. A number of us had been online from home before, especially those of us in contact with the localization teams in Europe and Asia. Meanwhile, these teams were trying to stay awake long enough to get last minute messages through to the release coordinators in California so that the localized builds could be certified for inclusion in the 1.0 release.
The day in the office started with a flurry of activity around localized builds. The Firefox 1.0 release was the first time we’ve included localized builds in our main CVS tree and included them in our release process. Adding 50 or 60 builds (20 languages, 3 platforms each) to a release is a big deal and Chase spun many sets of localized builds. The last planned spin of all localized builds was at 10 a.m. Monday morning. As the builds arrived, the localization teams around the world began to check in to verify them again, and our Quality Assurance groups went into high gear.
In the meantime, the engineers were always looking for last minute issues. We were confident of the release, but it’s in our nature to keep looking for things to improve until the last minute. Brendan isolated himself in the back of the office and went through crash data, trying to eke out one last problem until finally he gave up. Darin and bryner and dveditz continued their quest through bugs and mail to look for other improvements.
Chris Beard and I huddled on a set of questions, ranging from press inquiries to official and unofficial builds to localization questions to distribution programs. I spent a lot of time working with organizations interested in the Fireflox launch and finishing the blog post about the Firefox end user license agreement. Chris Hoffman monitored all activity, constantly encouraging people. Dbaron tracked the set of things that needed to be done to make the launch happen. These included:
- Determining the list of localized builds that were ready to be shipped with the English version
- Getting all builds pushed to the ftp site at the right time
- Verifying that the download infrastructure was ready (more on this later)
- Verifying that the 1.0. launch page for the website was ready to go
- Verifying that the infrastructure for the website was ready to go
- Verifying that the infrastructure for the start page was ready to go
- DNS redirect.
Various people spent some time looking at the infrastructire for “update.mozilla.org.” This infrastructure was new and we knew that it would have trouble scaling, so we wanted to most out of the existing infrastructure. Others spent time keeping in touch with the Spreadfirefox.com community.
We knew that Mozilla Europe and Mozilla Japan were ready to go. Each had done or coordinated massive amounts of work to have appropriate materials translated into localized websites in multiple languages as part of the launch. Bart Decrem had spent time in Europe with the Mozilla Europe folks and Chris Hofmann had gone to Japan to work with Mozilla Japan in coordinating the international launch. We made sure that Mozilla Europe and Mozilla Japan could be reached from the main page of www.mozilla.org, and tried to ensure that appropriate localized versions would be offered to people visiting those sites.
During the day people would find interesting tidbits and tell them to the group. Ben and Asa found a few folks who had created countdown clocks. We read aloud various blog posts about waiting for Firefox to encourage people finishing up the last bits of work.
All this formed the background activity. The main focus for much of the day was the QA and build team creating and checking out the localized builds. For each build, a whole set of things need to be worked out. This included:
- Does the build open?
- Does the installer work?
- Are the UI items localized?
- Are the right fonts used?
- Are appropriate links localized, allowing users to get to content in their own language?
- Are the search plugins localized, allowing users to get relevant search contents?
- Is the start page translated?
- Is the start page localized (including links, search capability, etc.)?
Our amazing QA team here in the office — Marcia, Sarah, Tracy, Jay and Asa went to work and worked tirelessly. So did Chase, our build and release maven, who created the build infrastructure for the localizations before we’d even given him time to acclimatize to life at the Mozilla Foundation. We made a giant list of all the localizations that might be ready, a list of the localizations for which localized search engines are available, a list of the localization teams who had certified their builds, and a list of any problems the QA team had found.
Periodically one of the QA folks would ask about some potential problem — was it a blocker or not? We have one big open space, so a bunch of us would look up and figure it out. This is particularly helpful in a few cases. For example, about 9 p.m. Asa groaned and announced that the Japanese version seemed to have a serious font problem and he didn’t think we could ship it. Immediate conversation ensued, since the Japanese team had just certified the build as ready. Chase started wondering about the fonts on Asa’s machine, jumped up and went to explore. Sure enough, Chase was right about fonts, the Japanese localization team was right about their build, and the Japanese version was cleared to ship out with the 1.0 release. One spike of tension receded.
As the list of localizations was finalized, other bits began reaching the final stage as well. Ben became more focused about the website push. Dbaron became more focused about the various pieces fitting together. Ben and I finished our blog posts. We also started tracking the start page infrastructure to make sure it was online. Chase began the process of getting the final builds pushed to the ftp site. Pav decided that we need to have BitTorrent available for the launch, so he settled down in a corner and went to work to make this happen. He took some kidding from those who felt that this wouldn’t be needed. Pav wasn’t having any of this, put his head down and dug in. Anyone who knows Pav knows that there’s no point in trying to stop Pav when he gets into this mode, so we left Pav to get BitTorrent going. We noted that the load at www.mozilla.org was going up. We assumed this was caused by people polling to see if the release was available yet, since we had read blog posts about people doing exactly this. It gave us another clue that yes, people were excited about seeing Firefox 1.0 appear.
About 11:00 pm Pacific Time Steve Garrity woke up to help with the final push. Steve is our lead contact with SilverOrange, the web design firm that has done the visual identity work for Firefox, Thunderbird and our website. Steve lives on the far East Coast of Canada, so our 1 a.m. launch time is 4am for Steve. Steve reports that he managed to wake up, but never got out of bed, and did the final push for the 1.0 launch website lying in bed with his computer in his lap. We were a bit nervous about waking Steve up but this turned out to be unnecessary — Steve came through on his own, as he always has. Even so, it was a mighty relief when messages from Steve began appearing! There were 4 or 5 things that needed to happen with the website for the 1.0 launch (changing the content at www.mozilla.org to our 1.0 release notice, pointing to localizations and some other info, and so on), and they went off without a hitch.
About 11:40 p.m. Asa looked up and said in a concerned voice: “I timeout when trying to reach www.mozilla.org. Can anyone else get to it?” The answer was no. The mozilla.org website was down. The office grew suddenly tense. Asa hadn’t spoken loudly, but everyone knew anyway. This is often the case — in a network-centric environment with lots of people in one room, you can almost tell when the network is down. Often when I find I don’t have a network connection I simply look up. If I see other people looking around then I know — network issue. I usually ask to be sure, but it’s almost redundant. At that moment Vlad reappeared. He came in the door grinning, noticed the odd silence and said “I though you guys would be celebrating.” “Can’t get to www.mozilla.org.” Vlad’s grin disappeared, he sat down, pulled out his computer and buried his head in it.
Dbaron and bryner went to work, as they always do when a problem like this arises. There’s nothing like this in their job descriptions, but that hasn’t mattered. In a minute or two there was a group clustered around dbaron debugging, and a steady stream of information was coming from dbaron. After a bit he says something about “two heaps” and “immense traffic.” Then he says “It looks like it’s coming from Myk!” Soon we’ve established contact with Myk. Yes, all the activity is coming from him. He’s bringing a second server on line to support the projected load at www.mozilla.org.
Myk has been our toolsmith for many years. We got to know him as a result of his application Forumzilla, and we jumped at the chance to hire him years ago. Since the Mozilla Foundation was launched Myk has taken on the Herculean task of coordinating our systems administration and infrastructure. We’ve had great help from a group of tremendous volunteers, but Myk has been the central point. For someone who didn’t ask for the job, he’s done amazing things.
“Right now?” we ask. “Doesn’t it seem a bit earlier would have been better?” Well yes, of course. But the machine took longer to arrive than expected, and now is when it was ready. Vlad looked up and said “Well, he’s got 8 minutes to get it done. That should be plenty.” And so it was. Being Myk, the job was done in 5 minutes, the work was perfect, there was no hitches, and within a few minutes www.mozilla.org was back and ready to go.
Someone asked “are we ready?” Chase answered that the release bits were on the site, both in English and the many localizations. We went through the list of all things people were thinking about, orchestrated by dbaron for the network aspects and Ben for the Firefox specific elements. Chofmann’s nearly jumping up and down by now, “Yes, we’re ready, we’re ready, we’ve been ready. Push the bits!”
We decide we’re ready. The last thing to do is to get the revised home page for www.mozilla.org actually “pushed” to the website, publish the blog and related posts, see the mozillazine news article posted and watch. Pushing new content to www.mozilla.org takes a few minutes. That’s because content is stored in our CVS repository and so at least part of the website source tree needs to be rebuilt to implement the new content. Normally this happens automatically every 30 or 40 minutes. We don’t want to wait that long, as we start a manual rebuild to get the content pushed sooner. We’re all used to this wait for new content to appear but this time it’s a very focused wait. No one is doing much of anything, we’re all sort of standing around.
And then, voila. “It’s done.” We all race to our laptops and go check out www.mozilla.org. We’ve all seen the content before, we’ve been looking at it for days, checking for problems, tweaking it. But this is the first time we’ve seen the content live on our public website, and we all have to look to make sure it’s really there — Firefox 1.0 is available, the Mozilla community has delivered something exciting, and we’re proud of it.
What did people do next? Did we all jump up and down, run around and have a giant party? No. We watched the network. Yup, that’s what we did. Chofmann managed to get a group to stop long enough to come back and at least acknowledge the glasses of champagne, and even to take one and wander around with it. But always back to watch the network. Is www.mozilla.org looking OK? Is the http download traffic looking OK? Is the ftp download traffic looking OK? Here we are:
The next day we arrived at work a bit later. We were beginning to get an idea that Firefox 1.0 was getting the reception we had hoped for. (I don’t think we yet had any idea of the reception that Firefox 1.0 has actually achieved, which has been phenomenal.) In a sense it was a bit anticlimactic because I couldn’t touch or feel the response. Our download traffic is handled by a set of mirror sites coordinated by the marvelous folks at Oregon State University’s Open Source Lab (http://osuosl.org/). Other significant university and research participants in our mirror program are Georgia Institute of Technology, Indiana University, the University of Utah, the Friedrich-Alexander University Erlangen-Nuremberg in Germany, and the Spanish National Research Network) and the Internet Systems Consortium. There are also a few commercial entities that assist, and we are grateful to all of them. We get logs and such from the mirrors through OSU, but of course that’s a step removed from managing this ourselves and having immediate access to the data.
After a while the anticlimactic feeling faded as we began to get information about the number of downloads and the general reception of Firefox. As best we can figure it, around 1,000,000 people came to download Firefox on the first day alone. That’s an astonishing number, far beyond what we had seen before. As Chris Hofmann put it, the building at the Mozilla Foundation might seem quiet, but the wires were burning up at Oregon State! And indeed the traffic did burn out some machines. By midday it was clear that our some of our mirrors were buckling under the load. For example, we had routed a good chunk of traffic to three jumbo sized load-balancers each fronting multiple machines in a big datacenter, and two of them burned out. (Note to alleviate speculation here: I am not talking about Google.)
Myk and chofmann had given a good deal of thought to this possibility and had two backup plans. The first potential backup was a load-balancing tool that distributed load across a larger set of servers (more about this below), and the second plan was a commercial vendor specializing in high-availability hosting.
Myk suggested we use a load balancing tool written by Mike Morgan of Oregon State. The idea was that in addition to distributing load to the small group of primary mirrors — the powerful ones that get our releases first, host everything we serve, and send us back logs so we can generate download numbers — we also distribute some load to the much larger group of secondary mirrors. The secondary mirrors may be less powerful, take more time to get our releases, not send us logs, and not host everything we serve (many of them host only the latest releases), but they still represent a significant amount of download capacity that could come in handy during periods of high demand.
Myk had evaluated the tool and believed it would improve our delivery capability dramatically. So early afternoon on Tuesday the 9th we implemented it. Sure enough, Myk was right, the tool performed as hoped and our ability to deliver Firefox to interested people improved significantly. Another bout of tension was reduced. Many, many thanks to Mike Morgan for writing the tool, and to Scott Kveton and the mighty team at OSU who helped get much needed equipment up and running in short order.
We didn’t reach perfection of course, the demand was too great. And as we had suspected, the infrastructure for update.mozilla.org was our weak point, and we had to curtail its operations during the peak activity.
Meanwhile, Bart and Pav were preparing for the AIR MOZILLA web event, a 5 hour live webcast and text chat. The show had interviews and discussions with Mozilla staff, with questions and music. I was so engrossed in the day’s activities I didn’t quite understand why there were suddenly speakers next to my desk, but then the event began and it became clear. We wanted some way to connect with people involved with Firefox and we can’t do by all getting together, so Bart came up with the webcast idea and Pav played host. Bart has spent some time looking at technologies that help with community development, such as the drupal / civicspace technology he connected with the Spread Firefox project, and AIR MOZILLA was another example. Bart orchestrated, Pav interviewed a lot of long time mozilla.org participants for a webcast. We simultaneously hosted a 2-channel IRC session where people could pose questions. Hundreds of people joined the IRC sessions where Asa and Marcia played IRC hosts, gathering questions and passing them on to Bart and Pav and then to various participants. The event was a nice, low-key marker of the tremendous international interest and participation in the project.
By the time the AIR MOZILLA event came to an end serious fatigue had set in, and many of us went home to get a good, long rest. Or at least the start of one. I suspect that many of us didn’t get enough of a good long rest until the holidays at the end of the year.