Posts Tagged with “privacy”

Revised “data” goal for 2010

December 14th, 2008

My proposed revision of this goal is:

Goal:  Make the explosion in data safer, more valuable and more managable for  individuals

This would be followed by some subpoints, along the lines of those below. They need some work, but I want to post the general idea for reaction before I spend more time on the subpoints.

  1. products offer people realistic options for managing data created by or about them
  2. people EXPECT access to their data, ability to combine it, move it, manage it as theirs

The change is because I don’t feel we have a solid enough consensus on the original proposal. This was:  provide leadership in

  • helping people exercise better ownership and control over their data
  • making anonymous, aggregate “usage data” more of a public resource

The idea of making any data available to anyone has generated concern. Some of this I think is due to a lack of concrete examples, or to a misperception that this would involve Mozilla software tracking people’s behaviors, or to a concern that it’s hard to anonymize data. But some of the concern is a basic discomfort with the currently invisible generation and processing of so much data, or the idea of a public “data commons,” or a concern about what’s happening “to” me through my software.

The data explosion is only just beginning, and it’s powerful. New forms of data can help us understand new things and solve new problems we can’t even see the shape of yet. But there’s a risk that each one of us will end up at the mercy of others who control the data. This risk affects our privacy, the degrees of choice and innovation available, and the degree of centralization of our online lives.

We should have a goal that reflects both the potential benefits and the risks of the data explosion.

Disconnect Regarding Data

October 6th, 2008

I’ve read the comments to my last post a number of times and I think I now understand what’s happening.

There are a bunch of comments along the lines of “if Firefox starts to include something that tracks my behavior and automatically sends that information off to someone else, then I don’t want to use it anymore.” Absolutely. I don’t want to use that kind of product either. That’s why I’m part of Mozilla — to build products that don’t do this sort of thing. To be explicit: Firefox and Mozilla will remain intensely focused on privacy, protection of personal data and user control over that data. The Mozilla community won’t build or support products that do otherwise.

There are also some comments that discount the examples I used because they are “server-side.” Yes. Absolutely. The examples are server-side because that’s what I mean.

The kind of data I’m trying to talk about is more like census-data: how many people are using the Internet; what are the broad patterns of Internet development and usage. In our physical lives, the basic demographics of our population collected in a census are a valuable shared resource. In understanding the Internet aggregate, anonymized, server-side census-like data can also be a valuable public resource.

This kind of data can of course remain a private resource, held by those websites big enough to generate their own understanding. My point is that moving some of this census-like data from the private to the public realm could have great benefits.

I’m wondering if this distinction, which is so clear in my mind, has not been clear in my writing. The term “usage data” may have made this worse. I explicitly do not mean using the browser to collect individual usage data. I mean looking at broad usage patterns that can be discerned from aggregated, server-side data, such as the examples I gave before.

Basic Examples of Usage Data

September 28th, 2008

In past posts I’ve said that I believe there is a need to make basic, aggregate, anonymized information about Internet usage more widely available. If everything that is known about the basic usage of the Internet is closed and proprietary then the Internet as an open platform will suffer. Here I’ll try to describe the kinds of data I’m talking about. For now I’ll call it “usage data” though that’s just a term of convenience.

There is a set of usage data that we’re quite accustomed to seeing in aggregated, anonymized form. Unconsciously I think many of us have come to realize that without public availability of this data we cannot understand even the basics of how the Internet is working.

One familiar example is the amount of bandwidth a site serves. Bandwidth data is critical to planning capacity and making sure the website doesn’t “go down” when spikes of traffic occur. Bandwidth usage is also tracked quite carefully by the ISPs (Internet service providers) for their planning and billing purposes. As an example, here’s a blog post showing bandwidth usage when we brought our facility in Amsterdam online. In addition to the data, there are also a series of posts about what was involved in making this happen, which we hope helps others who want to do similar things.

Another familiar example is the amount of “traffic” to a website in a day or a month. This is one important method of determining how popular a website is. Changes in these numbers can reflect trends and changing behavior. A specific page view might be associated with a particular person, and thus be sensitive personal data. But the total number of page views is not related to a specific person. It tells us overall how popular a site is.

A third familiar example are download numbers, which can be very informative in specific settings. For example, we had a real-time download counter during the Firefox 3 Download Day event. We were able to provide automated counts of downloads and current number of downloads per minute, each broken out by language, during this event.  Here’s some basic analysis of download locales, showing how global a project Mozilla is. And here’s a post showing the effect on download rates caused by a popular talk-show host. This information can be useful without any personal or individual data being disclosed.

These examples are clearly very general. I use them precisely for this reason — to demonstrate that we already understand the usefulness of this type of data and that it can be presented in an aggregate, anonymous form. There are other forms of aggregate, anonymous data that can be equally useful in understanding how the Internet is being used and ultimately, understanding what the Internet really is. I’ll describe some of those in a subsequent post; this one is long enough for now.

The types of data I’ve described above are carefully tracked, analyzed and used in planning and decision-making across the industry. It’s often not publicly available. We’d like to see more of this sort of information publicly available. We hope to start publishing more of this type of information about Mozilla. To do this, we need to be confident that people understand this is not publishing personal or individual data, and this is not Mozilla changing.

This is part of our effort to make the Internet accessible. At the same time, Mozilla will continue to be at the forefront in protecting individuals’ security and privacy.

Data — getting to the point

July 24th, 2008

I’ve received a couple of emails from people saying it’s hard to comment on the data issue without some idea of where I’m heading or what I’m thinking. So here goes. I’ll come back to some of the topics I’ve written about already. And I’ll continue with the other posts as well; I think we need some depth of analysis to make good decisions.

In the meantime, here’s the basic message.

I would like to see Mozilla provide more leadership in helping people manage the collection and treatment of data related to them — what I’ve called “Associated Data.” I don’t have a specific plan of what leadership would look like, or what features or capabilities this means our products, services or websites should implement (or block). There are a lot of different types of Associated Data; the desired treatment of different types may vary. This is something I’d like to see us figure out.

I would also like to see Mozilla provide leadership in treating some basic aggregate, anonymized usage data as a public asset. To do this, we need to develop a sense of what data this might include and what aggregation and anonymizing techniques make the Mozilla communities comfortable. Some data — like public disclosure of bandwidth use, website rankings, etc. seem to be areas everyone is comfortable with, but we should make as few assumptions as possible. Sometimes it can be hard to get truly anonymous data and so this is an area where great care — and therefore  leadership — is required. But if everything that is known about the basic usage of the Internet is closed and proprietary then the Internet as an open platform will suffer. I don’t have a specific plan as to what Mozilla might do here; that’s the point of the discussion.

These are difficult and sensitive topics, it would be easier to ignore them. But both of these areas are critical to building the Internet that is healthy for the individuals using it. The Mozilla mission is to keep the Internet an open platform, and to promote the values in the Mozilla Manifesto. It will be hard to do this if we ignore the effects of data.

Skip past the sidebar