Mozilla

Posts Tagged with “data”

Mozilla Websites, Web Analytics and Privacy

April 9th, 2008

This document discusses the application of web analytics tools to Mozilla websites.

We live in a world of data; we should be thinking carefully about that data and its impact. Many people don’t realize how much information about them is collected by websites and used as a business asset. Some of those who do understand don’t care, or figure there’s no sense talking about it. But a core of the Mozilla community is intensely focused on privacy and the individual person’s ability to understand and control personal information. This has always been the case, and it is part of our strength. These aspects should continue to inform the development of both our software and our websites. With this in mind, I’ve put together a discussion of a particular data-gathering proposal, together with the safeguards that make me comfortable with it.

We would like to understand how people interact with Mozilla’s websites, in particular the consumer-facing websites such as www.mozilla.com, mozilla-europe.org and mozilla-japan.org. To do this we want to implement tools that measure what people do when they visit these sites. These tools are generally known as “web analytics” tools. In particular, we want to implement a product called SiteCatelyst from a company called Omniture for a range of Mozilla websites. The specific sites, the phased rollout plan and the evaluation details are below. Using this services means that data about Mozilla visitors will be processed by Omniture, and will be stored on servers that are not under the direct, physical control of Mozilla. This is new to us and requires consideration of appropriate safeguards. Some wonder if it should even be done. I believe the proposal below is worth trying, and that our arrangement with Omniture includes appropriate safeguards.

Commitments

Mozilla will use the web analytics data only to determine aggregate usage patterns for our website. We will not seek to determine personal information from this data. Omniture will use the data from Mozilla websites only to provide and maintain the service for Mozilla; it will not share the information with others or use the information for other purposes. Omniture will not “correlate and report on any Customer Data with any other data collected through other products, services or web properties.” The domain names in Mozilla cookies will clearly identify their affiliation with Mozilla and the Omniture service. We will have public discussions of the results. Before the end of 2008 we will have a public discussion about the benefits (or lack thereof) of using this system. There will be a clear public statement about which web analytic services, if any, are in use with our websites. There will be a public notice and discussion period before including other types of websites, such as developer.mozilla.org and spreadfirefox.com.

Description

One aspect of the Mozilla project that is bigger than many people realize is our website presence. There are actually a number of Mozilla sites. (Or, in industry terms, “website properties.”) There are the development and community-focused sites like developer.mozilla.org, and spreadfirefox.com. And then there are the websites that consumers visit — in particular the download, support and services mozilla.com, mozilla-europe.org, and related sites. The latter are significant web presences, causing Mozilla to periodically appear in the list of top 50 most visited websites published by comScore (an Internet measurement firm analogous to Nielson in the TV space).

1. Our websites act as integral components of our users’ experience. They are also a primary way of communicating with most of our users who aren’t likely to read Planet Mozilla, the newsgroups or other community tools. Today we know very little about how people interact with our websites, in particular the consumer-facing websites. To improve the experience we first need to know some basic data about how users interact with our website properties. We’d like to understand things such as:

  • Is something we think should be easy — like getting from a top-level page to useful add-ons — simple enough for people who aren’t familiar with Mozilla?
  • If we add a landing page with explanations, do people get lost at those pages? Or do these pages help people as we had hoped?
  • How many users successfully find, download, install and become long-term Firefox users?
  • What paths do people take through the website?
  • Is something new (like the dropdown content on the “whatsnew” page) useful to people? How many people see that page and actually click on the links?
  • Do people find the language version of Firefox that fits their location?

2. Each of these websites is large and complex, and each gets an enormous number of visits from general consumers — that is, from people who are not familiar with Mozilla, may not be power users, and whom we can’t claim to understand from our own experiences. Those of us who work on the Mozilla project have — by definition — some familiarity with Mozilla. That is not the case for most of our current 150 or so million users. What feels “easy to use” or comfortable to us could be completely wrong for many people who visit these websites. Furthermore, what might make sense in one language or locale might not be helpful in other languages or cultural contexts.

3. How do we develop a better understanding of how people interact with a website? The basic answer is to gather aggregate data about how people use the website. The term generally used to describe this is “web analytics.” Aggregate data will help us answer the types of questions listed above.

4. What techniques are used to instrument a website so that it aggregates data about usage patterns? Two elements are used together to gather data– “cookies” and “web beacons.” A cookie is a string of information that a Web site stores on a visitor’s computer, and that the visitor’s browser provides to the Web site each time the visitor returns. Because the browser provides this cookie information to the website at each visit, cookies serve as a sort of label that allows a website to “recognize” a browser when it returns to the site. A “web beacon” is a marker placed in a webpage that makes it easier to follow and record the activities of a recognized browser, such as the path of pages visited at a website.

5. Are there negative things that could happen with this data? As with many kinds of data, yes. It is possible to correlate web analytics data with other data and potentially figure out persona information. Mozilla does not do this and Omniture is not allowed to correlate Mozilla data with any other data to derive personal information.

6. What precisely is Mozilla proposing to do? Use a web analytics product from Omniture called SiteCatalyst to measure interaction with a number of our other consumer-facing websites. The proposed rollout of the web analytics is in phases:

  • Phase 1: www.mozilla.com, firefox.com, getfirefox.com, *.mozilla.com. Rollout is pending discussion and feedback on this document. I believe the concerns raised in the newsgroup discussion are addressed, so there may very little discussion to be had. In that case, the implementation will occur shortly. We would also amend our Privacy Policy as appropriate to describe the storage and processing of this data by a third party.
  • Phase 2: www.mozilla-europe.org, possibly mozilla-japan.org, pending discussion and feedback on this document.
  • Phase 3: Discussion and review period of usefulness of data at the end of 2008.
  • Phase 4: (Pending outcome of Phase 3): add other Mozilla websites such as: addons.mozilla.org, developer.mozilla.org, www.mozilla.org, spreadfirefox.com, planet.mozilla.org; or consider use of a different or additional web analytics program.

7. Isn’t there an open-source or free software version that will do the job? Not that we know of.

8. Why don’t be build our own? This is a significant project in which we have no expertise. We need a solution that works at scale, in a complex, distributed setting, and is available now. That’s a serious project to take on, and one that would certainly take a lot of time and focus. We’d need to build a new community of people that embodies Mozilla DNA and values AND build a world-class piece of software. We’re not experts in analytics or in defining requirements, so we would have to wait until a fair amount of development was done before we could even begin to evaluate how helpful the project was. For those people who were around Mozilla since the early days, you will undoubtedly remember the enormous pain of trying to build the application (in those days the Mozilla Application Suite) before we had a solid infrastructure (the Gecko implementation.) The idea of building an analytics package while trying to use it at the same time on websites as complex as the those in question is a recipe for disaster.

9. Why Omniture? Omniture has many positive points. The use of the data is limited to providing the web analytics service to Mozilla. The product SiteCatelyst is widely used solution for large websites; it’s known to scale, be stable, and provide reliable, trustworthy results. Access to the data is highly secured and Omniture provides support resources. In addition, there is a user interface for allowing individuals to opt out of the web analytics processing. There are some drawbacks of course, there usually are. Omniture is not open source code, which we always prefer. Our arrangement with them is contractual. That’s helpful in that it allows us to include the privacy safeguards in the contract. But as is almost always the case the complete contract is confidential. Omniture has been criticized for its business practice of using cookies that don’t clearly say they are from Omniture. It turns out Omniture allows its customers to specify whether they want a cookie with the Omniture name in it. Mozilla cookies will do so. And finally, Omnniture is not free. Use of Omniture requires payment, unlike other options and the cost generally rises with the usage of the sites. So it could get expensive and we’ll have to monitor this.

10. How will we evaluate if the data is worth the effort to get it? We’ll look at the results. We have a set of people who are adapt at looking at data — Ken, Polvi and Daniel, who just joined us. Ken and Polvi have been publishing what we’ve learned from the data we do have, and we’ll see what can be learned from the additional data. We’ve already moved the data (known as “metrics”) discussions into the public via the Metrics Blog We will continue to do this.

11. Will Omniture be used with all Mozilla websites? We don’t know yet. As noted above, we’ll do a review of the consumer-facing sites and see how valuable the data is and how we feel about gathering it. We may also look at alternative providers as part of this discussion. Then we can decide about other sites as well such as our developer and community facing websites.

12. Privacy Policy. Our current privacy policy says that Mozilla data won’t go to an outside third party. So it will need amendment to allow for this case. Details on the proposed changes will follow, but for now I’d like to talk through the goals and proposed techniques.

13. Sensitivity to data, privacy and user control. Most websites (and the organizations running them) are unabashed about collecting data, and using that data to improve their business. The use of web analytics is a standard practice, taken for granted by many website operators. This proposal is an extremely mild version. Some people have suggested to me that this discussion is “much ado about nothing” and reflects an extreme focus on privacy of a portion of the Mozilla community. I agree that this is a mild proposal, collecting the most basic of data. But I don’t believe this discussion, or the basic concern is irrelevant or extreme. As noted above, we live in a world of data; we should be thinking carefully about that data and its impact.

***

Comments welcome here. If you’re interested in the full discussion, head over to the mozilla.org Governance newsgroup. You can also read a set of past comments and participate through the mozilla.governance Google Group.

Skip past the sidebar