Mozilla

Data Relating to People

July 23rd, 2008

In my last couple of posts I’ve described why I believe Mozilla must pay attention to data in order to help individual people deal with  data about them.

There’s a lot of data about people being created.  I’ve listed below some of the basic kinds of this data  that I think we need to be able to distinguish in order to speak meaningfully the effects.  I’m calling all of these categories “Associated Data” for the reasons described at the end of the post.

Is there a type of data about people that’s of interest or concern to you? If so, take a look and see if it fits into one of the sections below.

  1. “Personal and potential personal data.”  These terms are already in reasonably wide usage to mean specific information that identifies an individual, such as name, address, email address, credit card number, government-issued identification number, etc.   In some cases it’s used to include other information that can be combined to create personal information, such as an IP (Internet Protocol) address.
  2. “Intentional Content.” Data intentionally created by people to be seen by people.  When we post to social networking pages,  blogs, photo sites, product review sites, create wishlists, send gifts and other online markers we intentionally create content about ourselves or associated with us.   Sometimes this information is in big chunks, like a blog post or photostream; other times the information is in small bits like a recommendations, “pokes,” etc.  Sometimes we want this data to be public and sometimes we may not.
  3. “Harvested Data.” Information gathered or created about an individual through the logging, tracking, aggregating and correlating of his or her online activities.   It’s possible today to record just many of the actions someone takes online (the “clickstream”) and then to harvest patterns and other useful facts from that data.  For example, an e-commerce website you visit regularly will know a great deal about your shopping patterns, what kinds of items and what price ranges you look, how many times you look before you buy, the average purchase amount, the average time between purchases, etc.   They’ll know which ads you respond to and which you ignore.
  4. Relationship Data.  Our relationships with other people, such as our “friends” or followers at various sites.  This can  be either Intentional Content or Harvested Information.  I call this out specifically because a relationship always involves at least two people.  And so the treatment of this information — is it public or private, how is it used — always affects at least two people.  I’m not yet positive this is a useful topic, but (obviously) I think it likely enough to include it here.

“Associated Data.” It will be helpful to have a term that describes all these types of data.  In a vacuum “Personal” would seem the best because this is all information that somehow identifies, is related to or associated with a specific person.  But I think “personal” is understood as item 1 already.    I’m using the term “associated data” to mean all of the types of data listed above.

Are there other broad categories of information about people that would help us think clearly? Are there different categories altogether that would be more helpful?  And are there examples of this kind of data you’d like to make sure we think about? If so, note them in the comments or somewhere where we can find them.

5 comments for “Data Relating to People”

  1. 1

    lrbabe said on July 23rd, 2008 at 8:17 am:

    Intentional content may be more than just blog posts, photos, things like that, what about “knowledges” ? Data shared for the public good, such as encyclopedic articles, video tutorials (screencasts), slideshows (see those from J.Resig hosted on slideshare)

  2. 2

    mawrya said on July 23rd, 2008 at 9:15 am:

    Your four categories: Personal, Intentional, Harvested and Relationship seem to go back and forth between identifying data by how its created: Harvesting and Intentionally versus what the data *is* – Personal metrics and Relationship information, for example. I find it easier to wrap my mind around the topic when a single, defined naming convention is followed. For example, Harvesting is a creation method but the name you would give to the class of data might be “Behavioral” were you to name it based on what that data really represents, what it *is* – most web sites collect data on visitors to track their behavior.

    Of course it all depends on what you indent to do. Methods of data creation can be just important as the data content itself. But I think the distinction needs to be clear.

  3. 3

    Guillermo said on July 23rd, 2008 at 7:31 pm:

    There’s a kind of data that it’s near the 4th of your data, but still different. The data created by a individual but modified by himself or by others, like in a wiki. There’s a time when it’s maybe impossible to know the «real» author of that data.

    I Hope you can be able to understand my English

  4. 4

    allankz said on July 27th, 2008 at 9:36 pm:

    Data can be considered as anything you know about the anything that is either it is living thing or a nonliving thing, any kind of information treated as data. So you can easily say that existence of life is all depend on data. The data is related to people as their life related with breathing.
    ======================================
    allankz

    http://www.widecircles.info

  5. 5

    Edward Barrow said on July 29th, 2008 at 4:38 am:

    I think the key word is “about”: this is data about people.

    Consider this as an association, coupling the data item (name, phone-number, colour-preference) to the individual.

    There exists, for any individual, a large number of these links or associations. Some are valid, correct, and useful; others are just wrong. Some are public, and some are private.

    Now, suppose we put a web-of-trust PKI over this, so that it becomes possible for anyone digitally to sign such an association, including (of course) the data subject.

    Then the associations become more useful – rather than being of the form “Person A (is/has) Attribute X”, they take the form “Person T says that Person A (is/has) Attribute X”.

    The worth of any such statement depends on the trust we place in person (or institution) T’s authority to decide on Attribute X in relation to person A. For many, but not all, attributes, the most trustworthy source is Person A. Thus I am likely to value “Person A says that Person A’s phone number is P” more than “Person Y says that ….” – unless Person Y happens to be the phone company.

    Digital signatures (by authors of their articles and amendments), in association with the history functions could also make wikis trustworthy.

Skip past the sidebar