Mozilla

Basic Examples of Usage Data

September 28th, 2008

In past posts I’ve said that I believe there is a need to make basic, aggregate, anonymized information about Internet usage more widely available. If everything that is known about the basic usage of the Internet is closed and proprietary then the Internet as an open platform will suffer. Here I’ll try to describe the kinds of data I’m talking about. For now I’ll call it “usage data” though that’s just a term of convenience.

There is a set of usage data that we’re quite accustomed to seeing in aggregated, anonymized form. Unconsciously I think many of us have come to realize that without public availability of this data we cannot understand even the basics of how the Internet is working.

One familiar example is the amount of bandwidth a site serves. Bandwidth data is critical to planning capacity and making sure the website doesn’t “go down” when spikes of traffic occur. Bandwidth usage is also tracked quite carefully by the ISPs (Internet service providers) for their planning and billing purposes. As an example, here’s a blog post showing bandwidth usage when we brought our facility in Amsterdam online. In addition to the data, there are also a series of posts about what was involved in making this happen, which we hope helps others who want to do similar things.

Another familiar example is the amount of “traffic” to a website in a day or a month. This is one important method of determining how popular a website is. Changes in these numbers can reflect trends and changing behavior. A specific page view might be associated with a particular person, and thus be sensitive personal data. But the total number of page views is not related to a specific person. It tells us overall how popular a site is.

A third familiar example are download numbers, which can be very informative in specific settings. For example, we had a real-time download counter during the Firefox 3 Download Day event. We were able to provide automated counts of downloads and current number of downloads per minute, each broken out by language, during this event.  Here’s some basic analysis of download locales, showing how global a project Mozilla is. And here’s a post showing the effect on download rates caused by a popular talk-show host. This information can be useful without any personal or individual data being disclosed.

These examples are clearly very general. I use them precisely for this reason — to demonstrate that we already understand the usefulness of this type of data and that it can be presented in an aggregate, anonymous form. There are other forms of aggregate, anonymous data that can be equally useful in understanding how the Internet is being used and ultimately, understanding what the Internet really is. I’ll describe some of those in a subsequent post; this one is long enough for now.

The types of data I’ve described above are carefully tracked, analyzed and used in planning and decision-making across the industry. It’s often not publicly available. We’d like to see more of this sort of information publicly available. We hope to start publishing more of this type of information about Mozilla. To do this, we need to be confident that people understand this is not publishing personal or individual data, and this is not Mozilla changing.

This is part of our effort to make the Internet accessible. At the same time, Mozilla will continue to be at the forefront in protecting individuals’ security and privacy.

40 comments for “Basic Examples of Usage Data”

  1. 1

    viku6ka18 said on September 28th, 2008 at 12:43 pm:

    mozilla firefox is the best!!!

  2. 2

    karl dubost, W3C said on September 28th, 2008 at 7:44 pm:

    Maybe one of the possibilities to talk about data is to consider that none of the data is in fact anonymous. They contain a certain degree of identification. The nature and precision of identification increase a lot when these data are aggregated and mashup-ed from different sources. friendfeed or whois.com is another one.

    There is no easy way for common users to control the granularity of the aggregation. There is what we decide to reveal in some contexts, but we have little controls on what happen outside of the context, and that’s the issue. For example, you can have a personal site and a professional site, and no desires to see them aggregated in a mashup.

    Another source of troubles is the big lack of control on servers from the users. All hosting platforms offer a very limited set of features for sharing your data and/or blocking the access to this data to some user agents, bots, people, etc.

    Many interesting discussions to have on these topics.

  3. 3

    christoph said on September 29th, 2008 at 12:32 am:

    if FF is starting to store userinfos – i would de-install it!

  4. 4

    OliverMMS said on September 29th, 2008 at 4:24 am:

    1.) The question should not be, is it useful, but is it necessary?

    2.) You can never be sure “anonymous” is and will stay truly anonymous. Because this data is so valuable, there will be people trying to exploit it. And I am not talking about the 14yo hacker, but about multi-billion companies (employers) or governments. Not because they’re evil, but because it gives them an advantage.

    3.) The only risk you seem to see is that your users/customers won’t like it. But what about people not getting a job (or being imprisoned) because they surfed the wrong website?
    Let’s not do take the typical Guantanamo or China examples, but how about a UK scholar being imprisoned for 6 days in England because he downloaded something from a US government website (http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=402125&c=2)?
    How much more often do you think these kind of mistakes would happen – especially in the private sector – if you’d collect all the data?

    So my conclusion is – whatever the usefulness, if it’s not absolutely necessary, don’t collect the data. The risk is too high, it’s just not worth it. Imho privacy is an absolute right – just like freedom of speech. And collecting data undermines privacy.

    Oliver

  5. 5

    Franz B. said on September 29th, 2008 at 4:48 am:

    You write: “To do this, we need to be confident that people understand this is not publishing personal or individual data, and this is not Mozilla changing.”

    I see a missunderstanding here. It’s not about _publishing_ personal/individual data. It’s about _collecting_ personal/individual data.

  6. 6

    Reger said on September 29th, 2008 at 5:10 am:

    Don’t be evil… Like google ;-(

    Collecting not required and personal usage data is the beginning… Chrome collect more data… Because of this I wouldn’t use this.
    There is a fork called Iron, that pretend to not collect any data:
    http://www.srware.net/software_srware_iron_download.php

    I can only hope, there is some nice person, to build a firefox fork, that does not collecting any data, too!

    If not, IE will be much better… And Opera will be an option, too… Konqueror is another nice browser…

    So I will not support this.
    And I hope many other people will not support firefox, too, if firefox will collect data from that users.

  7. 7

    tim said on September 29th, 2008 at 5:11 am:

    You wrote:

    “These examples are clearly very general. I use them precisely for this reason — to demonstrate that […] it can be presented in an aggregate, anonymous form.”

    So there are two questions for me:

    1. Which data is stored (or is planned to be stored)? Not only exampels, i mean the whole list (every bit!)… .

    2. In which form is the data stored (or is planned to be stored)? Not only the presentation is important! Is the data stored in a not anonymous form…?!

    If the user can’t trust firefox, firefox is dead… .
    tim

  8. 8

    Matthias Versen said on September 29th, 2008 at 5:52 am:

    Please don’t turn into another google.
    Google got many negative press reports in germany because they collect data with chrome and many people don’t use it because of this.

    If you start and collect my data, then you are not better.
    Did you ask the users if they want this ?
    You can not collect data because you think you need it to understand the users if they don’t want it !

  9. 9

    Stephen Obermeier said on September 29th, 2008 at 6:09 am:

    Please do not implement the data collection functions within firefox. For me and lots of other people it would be definitly a reason to deinstall FF and ban it from the computer. Think about a research release for people who don’t have a problem being investigated. By implementing the spy functions you will lose a lot of your reputation.

    Keep on rockin’ the web!

  10. 10

    Pingback from netzpolitik.org: » Links der vergangenen Tage » Politik in der digitalen Gesellschaft

    […] ‘Net disconnections. Mozilla-Chefin verteidigt das Sammeln von Nutzerdaten in ihrem Blog: Basic Examples of Usage Data. Taz: Telekom gegen Verbindungsdaten-Auswertung – Zum Spitzeln gezwungen. Golem: Kreative und das […]

  11. 11

    Markus Rham said on September 29th, 2008 at 8:41 am:

    Should these features be implemented in Firefox then I would actively start to work with privacy protection groups to index Mozilla Firefox and ensure that people are being educated on such infringement on their privacy. This should be espically easy in the current climate. The statements you have made above are at best ill informed and missleading as so many have pointed out before me.

    I would think that you have enough clever people to advise you on such decisions before making a post which is sure to ripple through the internet and cause you as much bad press as the EULA issue did.

  12. 12

    one said on September 29th, 2008 at 9:58 am:

    Collect that data and I will ban Firefox from my personal computer and every other machine I am responsible for. And I will tell my friends to do the same.

  13. 13

    two said on September 29th, 2008 at 9:06 pm:

    WOw fisrt chrome than Firefox no do not collect data it is not necessary. Then I don’t use firefox anymore and I also tell my friends that they never will use firfox again!

  14. 14

    Pingback from Der gl

    […] will, so h

  15. 15

    Roman Friesen said on September 30th, 2008 at 2:45 am:

    This subject is _very_ sensitive, as you can see in comments above. I would suggest you offer this feature only as a plugin not installed in firefox by default.
    Maybe Mozilla will get fewer participants but that’s a tradeoff you probably have to make.

  16. 16

    Pingback from Anonyme Datensammlungen? - Mozilla-Chefin, Meinung, Blog-Beitrag, Allgemeinheit, Benutzerdaten, Anonymisierte, wpseo, blog - Der MozillaBlog

    […] Benutzerdaten für die Allgemeinheit – lest den Blog-Beitrag der Mozilla-Chefin – und sagt eure Meinung. Teile und hab Spaß Diese Icons verlinken auf […]

  17. 17

    seneca said on September 30th, 2008 at 9:30 am:

    The final reason to ban Firefox! Why would every browser collect its user data?! The super police state is coming closer every minute. Think about it, before its too late!

  18. 18

    Pingback from stefan.waidele.info » Blog Archive » Wenn die Benutzer Firefox nicht mehr trauen können, ist Firefox tot…

    […] Die Mozilla-Vorsitzende Mitchell Baker hat einige Beispiele gegeben, welche Daten Firefox sammeln k

  19. 19

    Pingback from Basic Thinking Blog | Mozilla und die Daten

    […] mal auf dumme Idee im Elfenbeinturm kommen. So Mitchell Baker (Chefin von Mozilla) passiert, die darüber sinniert, wie geil es doch wäre, wenn man wüsste, wo der Firefox-User überall so herumsurft. […]

  20. 20

    tor-user said on September 30th, 2008 at 5:41 pm:

    The german blog from stefan.waidele.info linked above contains a central sentence I like to translate (thx Stefan btw);

    “As users my browser history belong to me cause these describes what I have done. As a site operator my server statistics belong to me cause that is what I’ve delivered.”

    Personally I don’t care about server statistics but I do care about my browser history and I don’t like to share them with my gov through mozilla cause I don’t trust my gov. The only real protection that my political opinions visible in that history are not used against me as granted by our constitution is, to never collect them. Prevention of data is serious and important. To break that contract is a complete disaster.

  21. 21

    Pingback from Links der vergangenen Tage | World of Warcraft

    […] ‘Net disconnections. Mozilla-Chefin verteidigt das Sammeln von Nutzerdaten in ihrem Blog: Basic Examples of Usage Data. Taz: Telekom gegen Verbindungsdaten-Auswertung – Zum Spitzeln gezwungen. Golem: Kreative und das […]

  22. 22

    dowel said on October 1st, 2008 at 12:19 am:

    Any product which collects data about what I do in the internet is spyware – opensource or not. The necessity to rewrite FF in order to have a privacy-respecting browser will then soon lead to a fork, something called “privacy-fox”. If you want that – go ahead. There are enough coders out there.

  23. 23

    Fevrier said on October 1st, 2008 at 12:23 am:

    To make it simple:
    If you want my data you have to ask me, and if I say no,tto be honest, I would say no, you have to accept that. If not, then I have to look for an other browser.

  24. 24

    Thomas said on October 1st, 2008 at 1:48 am:

    I support the overall consens of the posts before. If firefox become a sort of watchman for whatever purpose then I will remove it immediately. A fork called “Privacy-fox” like stated before would be fine for me.

    Furthermore two questions are open:
    Is there maybe a connection between the lately announced google browser with unique tracking ID and your proposal to implement anonymous aggregation? Do you want to test the puplic opinion to this topic?

    To summarize:
    Don’t lead into temptation. Every aggregated information source or function that aggregates will attract people to misuse it. And there is no way to prevent it. Even when you can assure complete tech security what is fiction then you can bet there will be social engineering, lobbying etc.

  25. 25

    AndyP said on October 1st, 2008 at 3:55 am:

    Dear Firefoxes and Thunderbirds,
    as long as I had a reason to TRUST in you and your practises, as long as I could believe that mozilla is not evil, I had a reason to use the firefox and the thunderbird.
    Over time, the trust vanished. Be it the Debian-story, be it the EULA topic, be it the “dropping” of Thunderbird from the Firefox, be it the growing fat of a “lean” product …

    Trust is earned. Trust cannot be called for. Trust is honesty. Trust is the foundation of MY usage of any product.

    My trust is gone now.
    Thanks for the nice time, but now I join a different party.
    It took me some hours, but now I am on Opera.

    AndyP

  26. 26

    Pingback from Firefox: Bilgi avı ile ilgili detaylar | SEO DANİSMANİ

    […] şefi John Lily bunu mayıs ayında doğrulamıştı. Kuruluşun başkanı Mitchell Baker bir blog girdisinde ilk kez olası bilgi toplama biçimleri hakkında açıklama […]

  27. 27

    Sandra said on October 1st, 2008 at 6:15 am:

    I’ve been using web browsers since 13 years and changed browsers 5 times. The day Firefox starts to collect usage data, I will change again. I’ve heard Opera is free of charge now.

  28. 28

    Pingback from Firefox: Bilgi av

    […] Mozilla

  29. 29

    Pingback from Web Makaleleri » Blog Archive » Firefox Bilgi Avı

    […] şefi John Lily bunu mayıs ayında doğrulamıştı. Kuruluşun başkanı Mitchell Baker bir blog girdisinde ilk kez olası bilgi toplama biçimleri hakkında açıklama […]

  30. 30

    Pingback from Firefox: Bilgi avı ile ilgili detaylar | mIRC,mIRCmarket,Türkçe mIRC,mIRC indir

    […] şefi John Lily bunu mayıs ayında doğrulamıştı. Kuruluşun başkanı Mitchell Baker bir blog girdisinde ilk kez olası bilgi toplama biçimleri hakkında açıklama […]

  31. 31

    widget said on October 3rd, 2008 at 6:50 am:

    I think FF is great, but i think users will make a decision in the future between FF & google’s chrome.
    Althoug both will collect data.

  32. 32

    blue said on October 3rd, 2008 at 10:59 am:

    The decision will be between Firefox and Opera.

  33. 33

    ingiltere vizesi said on October 3rd, 2008 at 3:07 pm:

    Warum diese Web site haben nicht andere Sprachunterstützung?

  34. 34

    mirc said on October 4th, 2008 at 4:53 am:

    ty man nice;
    I’ve been using web browsers since 13 years and changed browsers 5 times. The day Firefox starts to collect usage data, I will change again. I’ve heard Opera is free of charge now.

  35. 35

    Pingback from Mitchell’s Blog » Blog Archive » Disconnect Regarding Data

    […] Home « Basic Examples of Usage Data […]

  36. 36

    Mitchell Baker said on October 7th, 2008 at 8:43 am:

    Markus

    Yes, this is server side data, it is explicitly not information collected by the browser.

    And yes, this does not make it risk-free. In fact, there are correlation techniques — especially the buying, selling and aggregation of data that some engage in — that have big privacy concerns. Mozilla will not do that.

    Yes to the transparency piece. This is how people can understand what their software is doing. This isn’t just browsers, it’s a range of software. That’s a big part of what I would like to do.

  37. 37

    Pingback from Geode - dein Firefox weiß wo du wohnst | F!XMBR

    […] allgemein zugänglicher Schnittstellen feilbietet, um so eher werden auch Begehrlichkeiten geweckt. Wie man Mozillas Vorstellung von Komfort und Features kennt, wird dies wohl ebenso als Opt-Out realisiert werden, denn wer würde schon auf Komfort verzichten […]

  38. 38

    Pingback from Firefox: Bilgi avı ile ilgili detaylar.. | idealsohbet.com - ideal sohbet - günlük haber blogu

    […] şefi John Lily bunu mayıs ayında doğrulamıştı. Kuruluşun başkanı Mitchell Baker bir blog girdisinde ilk kez olası bilgi toplama biçimleri hakkında açıklama […]

  39. 39

    Pingback from Tab Usage Insights: Survey vs Instrumentation

    […] session trails, indicating sequences of page loads and the distribution of sessions across tabs. Pure server-side metrics can’t offer this kind of insight into technographics of internet use or, closer to […]

  40. 40

    Pingback from The power of Mozilla = 2mW/user < mrz’s noise

    […] We talk a lot about Data.  It’s a fundamental component of Mozilla’s 2010 Goals, second only to Mozilla’s role as a centerpiece of the Internet.  In September, Mitchell talked about “usage data.” […]

Skip past the sidebar