Wednesday, October 01, 2008

Online Privacy: What Do They Know About Me?

[I first published this article several years ago. I have updated it with current information]

Several years ago I wrote a set of articles for WebMonkey discussing the information a web site can gather about visitors; how to gather, store, and use that information; and limitations of the gathered information. Those articles were geared toward web site owners who wanted to know how their web sites were being browsed.

Conversations over the years -- and particularly several recent conversations -- have convinced me of the need for an article discussing this topic as it applies to you, the Web user. Some people I’ve talked with have thought web sites could automatically get any information they want about them when they visit their sites. Other people thought they could be completely anonymous. Most people did not have the knowledge of underlying technologies and businesses necessary to understand the full reality. In this article I hope to provide some of that information.

Privacy vs. Security

Before beginning the discussion, I want to differentiate privacy from security. I’m sure you can come up with your own definitions of these terms, and you can find a variety of definitions for these terms. For the purpose of this article I define privacy as having others know only those things about you that you want them to know, whereas security means ensuring that the information you have and/or provide to someone is inaccessible to unauthorized people. While security is very important (and may be worthy of a future article), this article only covers privacy.

What Information Is Available?

Independent of the Internet, the first thing you should know is that there is almost assuredly a lot of information about you stored in commercial databases and available for sale. Types of information about you that may be available include:
  • Home address (available from the U.S. Postal Service)
  • Credit records (if you use credit cards)
  • Home ownership history
  • Purchase history
  • History of having children
  • Magazine subscription history
  • Anything you may have supplied in response to surveys and on registration forms
  • Legal records
There are a variety of companies that gather and compile databases containing information about individuals. As mentioned above, the U.S. Postal Service maintains a database of consumers’ current addresses. Experian, Trans Union, and Equifax maintain large databases containing consumer information used for credit reporting. These companies, as well as many others, sell or “rent” consumer information to organizations that want to know more about you. Though old, an article in the Washington Post is an informative read.

SWIPE provides a page describing how you can get your personal records from several organizations.

So what do these companies do with their databases? They provide their clients with information about consumers who their clients would find of interest. For example, an automotive magazine might want the names of people who buy certain types of cars so that it can send offers to them. Database companies also enable clients to learn more about their customers by matching their database records with the information clients have about their customers. So, for example, you may provide an automotive magazine with only your name and address, but by using a database company’s services, the magazine publisher can determine your credit worthiness or your history of auto purchases.

What does this have to do with the Web?

The nascent point here is that if a web site is able to gather one or a few key pieces of information about you (such as name and address, or social security number, or credit card number), it can gain a lot of information about you.

But what if you haven’t provided any information about you to the web site? What can the web site owner learn about you? To discuss this, we must start with some basics.

The Basics

When you open your browser, click on a link, or type a url (web page adddress) and click “go”, your browser sends a request to a web server for the page you want. Along with the url requested, your browser sends other information to the web server:
  • Your ip address. An ip address is a set of 4 numbers separated by periods. An ip address is assigned to your computer when you connect to a network. Your computer’s ip address is different than everyone else’s on the Internet. But it’s not quite as informative as you’d think. You’ll learn why in the discussion below.

  • Browser information (usually type and version), and often the operating system you are using.

  • If you click on a link, the url of the page you were at when you clicked on the link. This is called the “referer” (yes, that is the official spelling, even though it is incorrect).

  • Cookies that might exist for that web site (more on this below).

Anonymizer.com will show you what information your browser sends.

It’s important to state that your browser does NOT send your name, email address, or other information to web sites - with a caveat about cookies (which, again, we will discuss further below).

IP Address

First let’s talk about the ip address. I stated that ip addresses are not as informative as you would think because your ip address may not always be the same. Every time you connect to your ISP (AOL, Earthlink,...) using a modem, you are assigned a different ip address. If you have a broadband connection to the Internet (cable, dsl...), your ISP may assign your computer a different ip address when you re-connect. And the same may be true of your computer at work. Every time you restart your computer at work, your company’s network may assign you a new ip address.

So, bottom line, your computer’s ip address is not a good vehicle for enabling web site operators to identify you.

With that said, your ip address can be used to determine 1) what ISP you use and 2) where you are (in rough terms - not down to your exact address, but sometimes down to the city level.

This Wired News article discusses ip geolocation capabilities.

Cookies

Your web browser allows web sites to place bits of information on your computer. And it allows web sites to retrieve these bits of information from your computer. For example, abc.com could drop a cookie on your computer containing the date and time you visited their site. The next time you visit abc.com, your browser will pass this information back to the site. So now abc.com knows when you last visited their site.

Web sites use cookies for a variety of purposes. Some examples include:
  • When you see a checkbox on a web site’s logon page that enables you to log onto that web site without providing your id and password every time, there’s a good chance that the web site is storing your id and password in a cookie.

  • Web sites may also drop “session” cookies on your computer when you visit them for reporting purposes. The session cookie exists until you close your browser or until a specified amount of time has past since you last requested a page from the site (usually 20 or 30 minutes), and the web site uses it to review how long visitors stay, how many pages they look at, and how they traverse through their sites.

  • Web sites may store information that makes personalization and form-filling easier. For example, sites that greet you with “Hi, Bill” very probably have your name stored in a cookie.

Now an important point must be made about cookies: cookies that one web site drops on your computer can not be retrieved by another web site. So if you give your name to abc.com, and it drops a cookie on your computer, the web site xyz.com cannot get at that cookie.
So my privacy is assured, right?

Wrong! Forgetting about the Web for a second, let’s not forget that web site operators can sell your information. Legally - or illegally.

But back to the Web. A bit more on the basics. When you request a web page, your browser actually ends up making multiple requests. Every picture and graphic you see on the page is the result of a separate request. And different parts of a page can result from separate requests. So, even though you request the page from abc.com, some requests may have actually gone to xyz.com. Even worse, abc.com may place identifying data into the requests you make from xyz.com. So you may have never provided xyz.com any information about you, but because you provided abc.com information about you and you requested a page from abc.com that resulted in requests to xyz.com, xyz.com now has information about you!

And note that this isn’t a theoretical scenario. Thousands of web sites don’t put up the advertisements you see on their sites - they allow companies like AOL, DoubleClick (now part of Google), 24/7 Realmedia, Atlas DMT, ValueClick, and others to control the advertising space on their sites. So, for example, when you go to the Wall Street Journal Online, the page you request will call up ads from DoubleClick. Now imagine that DoubleClick serves ads for thousands of web sites. If DoubleClick drops a cookie onto your pc when you visit the Wall Street Journal Online, and then you visit New York Times on the Web (which also contracts DoubleClick to serve ads on its site), DoubleClick now knows that a single individual visited both sites. And if you’ve provided personal information to one of these sites, and it passes identifying information to DoubleClick, it’s feasible that DoubleClick can provide the other site with that indentifying information (note that I’m not saying DoubleClick actually does provide this service, nor that its customers provide it with identifying information - I’m just saying it is feasible).

A Quick Discussion about Email

Email can be sent to you in either plain text or HTML format (meaning formatted like a web page). If your email software is configured to allow the display of graphics and to allow JavaScript and/or VBScript, emails to you can be tracked. Emailers will be able to determine if and when you read their emails.

Also, unless you encrypt the emails you send, they are easily discernable as they are sent over the Internet, just as postcards can be easily read during their travels to their destinations.

Sounds Hopeless - What Can You Do?

So even if you don’t provide personal information to abc.com, it might be able to get that information from some other organization. What can you do?

First, you must decide how important it is for you to control information about you. Because the more you try to protect your privacy, the less useful you will find the Web. Given that you want to maintain some control, you can take the following steps (in order of increasing inconvenience to you):
  • Opt out of as many lists as you can. Start with the companies listed on SWIPE’s site.

  • Browse the Web using privacy software such as Tor or services such as Anonymizer, Anonymise.com, or MisterPrivacy.com.

  • Configure your browser (and email software) to turn off image loading. Images are often advertisements. If you turn off image loading, many advertisements will not be requested. Note that doing this does not preclude your browser from sending information to web sites via JavaScript.

  • Configure your browser to disallow pop-up windows. Since many pop-up windows are displayed for the purpose of displaying ads, this will serve to block requests for those ads.

  • Configure your browser (and email software) to turn off JavaScript and VBScript. This handles the issue described above. But it also means you will lose some functionality at many web sites.

  • Configure your browser to turn off cookies. Note that when you do this, many sites will no longer be able to log you in automatically, and many other sites won’t allow you to visit at all.

  • Encrypt your emails. You may need special software to do this, and your email recipients may have to have special software to decrypt them.

  • Don’t give out information about you in the first place. Note that this will preclude you from shopping online and from being able to visit many sites that require registration (of course you can provide untrue information in the latter case, but for legal reasons I can’t recommend that).

  • When you shop offline, use cash instead of credit cards, debit cards, or checks.

  • Move into the wilderness or buy an island and live off the land.

Bottom Line

While your browser doesn’t directly send personal information to web sites (that they have not already saved in cookies on your computer), but your privacy is far from assured as you surf the Web.

1 comment:

zee said...

comScore, the parent company of RelevantKnowledge, has invested substantial resources in making our data collection and privacy practices the best they can possibly be. Recently, comScore's ScorecardResearch service earned the highest possible rating of 50 out of 50 for its online privacy practices by PrivacyChoice, a leader in privacy technology innovation. ScorecardResearch is a service offered by comScore, which also operates the RelevantKnowledge market research panel. If you have further questions about RelevantKnowledge, please visit our website: http://www.relevantknowledge.com/faq.aspx
Thank you,
RelevantKnowledge Customer Support Team