Mozilla

Posts Tagged with “metrics”

Basic Examples of Usage Data

September 28th, 2008

In past posts I’ve said that I believe there is a need to make basic, aggregate, anonymized information about Internet usage more widely available. If everything that is known about the basic usage of the Internet is closed and proprietary then the Internet as an open platform will suffer. Here I’ll try to describe the kinds of data I’m talking about. For now I’ll call it “usage data” though that’s just a term of convenience.

There is a set of usage data that we’re quite accustomed to seeing in aggregated, anonymized form. Unconsciously I think many of us have come to realize that without public availability of this data we cannot understand even the basics of how the Internet is working.

One familiar example is the amount of bandwidth a site serves. Bandwidth data is critical to planning capacity and making sure the website doesn’t “go down” when spikes of traffic occur. Bandwidth usage is also tracked quite carefully by the ISPs (Internet service providers) for their planning and billing purposes. As an example, here’s a blog post showing bandwidth usage when we brought our facility in Amsterdam online. In addition to the data, there are also a series of posts about what was involved in making this happen, which we hope helps others who want to do similar things.

Another familiar example is the amount of “traffic” to a website in a day or a month. This is one important method of determining how popular a website is. Changes in these numbers can reflect trends and changing behavior. A specific page view might be associated with a particular person, and thus be sensitive personal data. But the total number of page views is not related to a specific person. It tells us overall how popular a site is.

A third familiar example are download numbers, which can be very informative in specific settings. For example, we had a real-time download counter during the Firefox 3 Download Day event. We were able to provide automated counts of downloads and current number of downloads per minute, each broken out by language, during this event.  Here’s some basic analysis of download locales, showing how global a project Mozilla is. And here’s a post showing the effect on download rates caused by a popular talk-show host. This information can be useful without any personal or individual data being disclosed.

These examples are clearly very general. I use them precisely for this reason — to demonstrate that we already understand the usefulness of this type of data and that it can be presented in an aggregate, anonymous form. There are other forms of aggregate, anonymous data that can be equally useful in understanding how the Internet is being used and ultimately, understanding what the Internet really is. I’ll describe some of those in a subsequent post; this one is long enough for now.

The types of data I’ve described above are carefully tracked, analyzed and used in planning and decision-making across the industry. It’s often not publicly available. We’d like to see more of this sort of information publicly available. We hope to start publishing more of this type of information about Mozilla. To do this, we need to be confident that people understand this is not publishing personal or individual data, and this is not Mozilla changing.

This is part of our effort to make the Internet accessible. At the same time, Mozilla will continue to be at the forefront in protecting individuals’ security and privacy.

Mozilla Websites, Web Analytics and Privacy

April 9th, 2008

This document discusses the application of web analytics tools to Mozilla websites.

We live in a world of data; we should be thinking carefully about that data and its impact. Many people don’t realize how much information about them is collected by websites and used as a business asset. Some of those who do understand don’t care, or figure there’s no sense talking about it. But a core of the Mozilla community is intensely focused on privacy and the individual person’s ability to understand and control personal information. This has always been the case, and it is part of our strength. These aspects should continue to inform the development of both our software and our websites. With this in mind, I’ve put together a discussion of a particular data-gathering proposal, together with the safeguards that make me comfortable with it.

We would like to understand how people interact with Mozilla’s websites, in particular the consumer-facing websites such as www.mozilla.com, mozilla-europe.org and mozilla-japan.org. To do this we want to implement tools that measure what people do when they visit these sites. These tools are generally known as “web analytics” tools. In particular, we want to implement a product called SiteCatelyst from a company called Omniture for a range of Mozilla websites. The specific sites, the phased rollout plan and the evaluation details are below. Using this services means that data about Mozilla visitors will be processed by Omniture, and will be stored on servers that are not under the direct, physical control of Mozilla. This is new to us and requires consideration of appropriate safeguards. Some wonder if it should even be done. I believe the proposal below is worth trying, and that our arrangement with Omniture includes appropriate safeguards.

Commitments

Mozilla will use the web analytics data only to determine aggregate usage patterns for our website. We will not seek to determine personal information from this data. Omniture will use the data from Mozilla websites only to provide and maintain the service for Mozilla; it will not share the information with others or use the information for other purposes. Omniture will not “correlate and report on any Customer Data with any other data collected through other products, services or web properties.” The domain names in Mozilla cookies will clearly identify their affiliation with Mozilla and the Omniture service. We will have public discussions of the results. Before the end of 2008 we will have a public discussion about the benefits (or lack thereof) of using this system. There will be a clear public statement about which web analytic services, if any, are in use with our websites. There will be a public notice and discussion period before including other types of websites, such as developer.mozilla.org and spreadfirefox.com.

Description

One aspect of the Mozilla project that is bigger than many people realize is our website presence. There are actually a number of Mozilla sites. (Or, in industry terms, “website properties.”) There are the development and community-focused sites like developer.mozilla.org, and spreadfirefox.com. And then there are the websites that consumers visit — in particular the download, support and services mozilla.com, mozilla-europe.org, and related sites. The latter are significant web presences, causing Mozilla to periodically appear in the list of top 50 most visited websites published by comScore (an Internet measurement firm analogous to Nielson in the TV space).

1. Our websites act as integral components of our users’ experience. They are also a primary way of communicating with most of our users who aren’t likely to read Planet Mozilla, the newsgroups or other community tools. Today we know very little about how people interact with our websites, in particular the consumer-facing websites. To improve the experience we first need to know some basic data about how users interact with our website properties. We’d like to understand things such as:

  • Is something we think should be easy — like getting from a top-level page to useful add-ons — simple enough for people who aren’t familiar with Mozilla?
  • If we add a landing page with explanations, do people get lost at those pages? Or do these pages help people as we had hoped?
  • How many users successfully find, download, install and become long-term Firefox users?
  • What paths do people take through the website?
  • Is something new (like the dropdown content on the “whatsnew” page) useful to people? How many people see that page and actually click on the links?
  • Do people find the language version of Firefox that fits their location?

2. Each of these websites is large and complex, and each gets an enormous number of visits from general consumers — that is, from people who are not familiar with Mozilla, may not be power users, and whom we can’t claim to understand from our own experiences. Those of us who work on the Mozilla project have — by definition — some familiarity with Mozilla. That is not the case for most of our current 150 or so million users. What feels “easy to use” or comfortable to us could be completely wrong for many people who visit these websites. Furthermore, what might make sense in one language or locale might not be helpful in other languages or cultural contexts.

3. How do we develop a better understanding of how people interact with a website? The basic answer is to gather aggregate data about how people use the website. The term generally used to describe this is “web analytics.” Aggregate data will help us answer the types of questions listed above.

4. What techniques are used to instrument a website so that it aggregates data about usage patterns? Two elements are used together to gather data– “cookies” and “web beacons.” A cookie is a string of information that a Web site stores on a visitor’s computer, and that the visitor’s browser provides to the Web site each time the visitor returns. Because the browser provides this cookie information to the website at each visit, cookies serve as a sort of label that allows a website to “recognize” a browser when it returns to the site. A “web beacon” is a marker placed in a webpage that makes it easier to follow and record the activities of a recognized browser, such as the path of pages visited at a website.

5. Are there negative things that could happen with this data? As with many kinds of data, yes. It is possible to correlate web analytics data with other data and potentially figure out persona information. Mozilla does not do this and Omniture is not allowed to correlate Mozilla data with any other data to derive personal information.

6. What precisely is Mozilla proposing to do? Use a web analytics product from Omniture called SiteCatalyst to measure interaction with a number of our other consumer-facing websites. The proposed rollout of the web analytics is in phases:

  • Phase 1: www.mozilla.com, firefox.com, getfirefox.com, *.mozilla.com. Rollout is pending discussion and feedback on this document. I believe the concerns raised in the newsgroup discussion are addressed, so there may very little discussion to be had. In that case, the implementation will occur shortly. We would also amend our Privacy Policy as appropriate to describe the storage and processing of this data by a third party.
  • Phase 2: www.mozilla-europe.org, possibly mozilla-japan.org, pending discussion and feedback on this document.
  • Phase 3: Discussion and review period of usefulness of data at the end of 2008.
  • Phase 4: (Pending outcome of Phase 3): add other Mozilla websites such as: addons.mozilla.org, developer.mozilla.org, www.mozilla.org, spreadfirefox.com, planet.mozilla.org; or consider use of a different or additional web analytics program.

7. Isn’t there an open-source or free software version that will do the job? Not that we know of.

8. Why don’t be build our own? This is a significant project in which we have no expertise. We need a solution that works at scale, in a complex, distributed setting, and is available now. That’s a serious project to take on, and one that would certainly take a lot of time and focus. We’d need to build a new community of people that embodies Mozilla DNA and values AND build a world-class piece of software. We’re not experts in analytics or in defining requirements, so we would have to wait until a fair amount of development was done before we could even begin to evaluate how helpful the project was. For those people who were around Mozilla since the early days, you will undoubtedly remember the enormous pain of trying to build the application (in those days the Mozilla Application Suite) before we had a solid infrastructure (the Gecko implementation.) The idea of building an analytics package while trying to use it at the same time on websites as complex as the those in question is a recipe for disaster.

9. Why Omniture? Omniture has many positive points. The use of the data is limited to providing the web analytics service to Mozilla. The product SiteCatelyst is widely used solution for large websites; it’s known to scale, be stable, and provide reliable, trustworthy results. Access to the data is highly secured and Omniture provides support resources. In addition, there is a user interface for allowing individuals to opt out of the web analytics processing. There are some drawbacks of course, there usually are. Omniture is not open source code, which we always prefer. Our arrangement with them is contractual. That’s helpful in that it allows us to include the privacy safeguards in the contract. But as is almost always the case the complete contract is confidential. Omniture has been criticized for its business practice of using cookies that don’t clearly say they are from Omniture. It turns out Omniture allows its customers to specify whether they want a cookie with the Omniture name in it. Mozilla cookies will do so. And finally, Omnniture is not free. Use of Omniture requires payment, unlike other options and the cost generally rises with the usage of the sites. So it could get expensive and we’ll have to monitor this.

10. How will we evaluate if the data is worth the effort to get it? We’ll look at the results. We have a set of people who are adapt at looking at data — Ken, Polvi and Daniel, who just joined us. Ken and Polvi have been publishing what we’ve learned from the data we do have, and we’ll see what can be learned from the additional data. We’ve already moved the data (known as “metrics”) discussions into the public via the Metrics Blog We will continue to do this.

11. Will Omniture be used with all Mozilla websites? We don’t know yet. As noted above, we’ll do a review of the consumer-facing sites and see how valuable the data is and how we feel about gathering it. We may also look at alternative providers as part of this discussion. Then we can decide about other sites as well such as our developer and community facing websites.

12. Privacy Policy. Our current privacy policy says that Mozilla data won’t go to an outside third party. So it will need amendment to allow for this case. Details on the proposed changes will follow, but for now I’d like to talk through the goals and proposed techniques.

13. Sensitivity to data, privacy and user control. Most websites (and the organizations running them) are unabashed about collecting data, and using that data to improve their business. The use of web analytics is a standard practice, taken for granted by many website operators. This proposal is an extremely mild version. Some people have suggested to me that this discussion is “much ado about nothing” and reflects an extreme focus on privacy of a portion of the Mozilla community. I agree that this is a mild proposal, collecting the most basic of data. But I don’t believe this discussion, or the basic concern is irrelevant or extreme. As noted above, we live in a world of data; we should be thinking carefully about that data and its impact.

***

Comments welcome here. If you’re interested in the full discussion, head over to the mozilla.org Governance newsgroup. You can also read a set of past comments and participate through the mozilla.governance Google Group.

Skip past the sidebar