I’m no fan of web analytics where multiple companies sneak a peek at what you are doing when visiting a website. However, I do not mind the website I’m visiting taking notes of what I’m doing while on their site.

The uses for tracking

Almost every website monitors some data about their visitors. Including the pages they go to, how long their stay on each page, roughly where in the world they are, and some generic details about their computers. In and of itself, this is pretty benign.

The collected data is used in aggregate to provide an overview over who the site’s visitors are and which pages they find most interesting. This data is in turn used to optimize the site and possibly the content to the audience. The data can also be shared with third‐parties; often to onboard advertisers by boasting about the size of the site’s audience.

The problem with tracking

Monitoring of audiences gets complicated when websites outsource their analytics to third‐party companies. The visitor is no longer sharing information about their visit with just the site they are visiting. Information is also sent directly to a third‐party. For larger websites, there are more often than not more than just one external analytics solution.

The website visitor is of course unaware of this sharing of information. It is completely transparent and there is no visible indication anywhere to the visitor about which companies are informed about their visit. The visitor sees the address of the page they are on and expects a direct relationship with that site.

Google Analytics is the most popular of these external analytics solutions. It alone is found on 60 % of the web’s most popular sites. That means that Google gets information about 60 % of the websites you visit. The address field of your browser may not say, but they sure know what goes on there anyways.

Google may share information about a website’s visitors with that website as payment for including their tracking code on their pages. However, what else does Google do with this data? It is very likely used as a signal in their search ranking algorithm to determine what is popular. The collected data is also likely used as a signal in their targeted advertisement system to target advertisements relevant to the visitor’s interests. As Google is already on 60 % of the most popular sites, they could corner the market by penalizing the rankings of sites who don’t include their tracking code or advertisements.

The two paragraphs above only concerned the worries over how Google may use the data they collect. Now consider how many companies are operating in the web analytics game. It quickly gets muddy and complicated.

These problems are not only found with outsourced analytics solutions, but with any embedded third‐party content. As discussed in that post, all these embedded elements also makes the website slower to load.

First‐party tracking is okay

When visiting a website, you send a request to that site so that they may return the page you wanted. That request contains some information about your computer as well as your IP address. It can also contain information about the page that referred you to the site. The Internet would not work without this transaction. This information can be logged. The allure of hosted analytics solutions is that they can collect more information and present it better than plain log files.

On this site, I’ve started using the self‐hosted (meaning all the data is transmitted to this server only) analytics solution Piwik. It gives me personally some insight to what visitors like and where they are from. Their IP addresses are anonymized for the first month of storage and deleted entirely after three months. More important to me is that there are no uninvited guests on the connection. The visitor visits my site and nothing is sent elsewhere.

I believe that when the website I’m visiting tracks some data about me then that is okay. The site gets some information it can use to improve itself; and I get the content they offer. Likewise, I believe it is okay that I track the site visitors while they are on this site. Leeches who add no value other than to extract data about visitors for their own means, on the other hand … . Until the industry gets its act together and removes third‐parties from their sites, I recommend you block those leeches using a tool like Ghostery.

With Ghostery, you can block analytics providers. With granular controls in Ghostery, you can choose to allow what they call “Local Analytics”. Which is to say Piwik, and a handful of other tools that operate only on the site were they are installed.

There are two sides of every coin. Ghostery itself is industry owned. They have an opt‐in (meaning you have to specifically turn it on) feature were “Ghostery collects anonymous data about the trackers you’ve encountered and the sites on which they were placed.” This information is then sold back to the industry.

Just to clarify, I’m not looking over your shoulder at the pages you specifically read. I’m looking at things like ”those who read this page are very likely to visit two more pages on the site” and “”. If you are uncomfortable with this tracking, you could opt‐out before. (The fact that you opt‐out will be tracked.) However, the way this should work is that you do not visit this site if you do not want to share any data with. Life would be much simpler if everyone could show some moderation on what data they gathered and played by some simple rules.

