MaxStocker.com   MaxStocker.com    
   
Home About Blog Stuff Contact
 
   
 

December 2010

Google Analytics vs Reality
Posted : Thu December 2nd

Twitter makes me mental or how not to write an API
Posted : Fri October 8th

3 random things
Posted : Tue July 6th

Whose blog is it anyway?
Posted : Wed May 5th

Random thoughts from the week
Posted : Sat April 10th

SEO and the magic beans
Posted : Mon April 5th

I really think
Posted : Sat March 27th

Blackberry development
Posted : Sat March 6th

Right way, wrong way
Posted : Thu February 11th

An update for December
Posted : Thu December 31st

MySQL, JDBC, Unicode and You
Posted : Sun November 29th

Whatever doesn't kill me will make me stronger
Posted : Thu November 5th

Somewhat random thoughts
Posted : Sat October 17th

Strange SSL woes
Posted : Wed October 14th

Recent Comments

Max in Whose blog is it anyway?
on Mon May 10th

Rob in Whose blog is it anyway?
on Fri May 7th

Anonymous in SEO and the magic beans
on Thu April 8th

Max in SEO and the magic beans
on Thu April 8th

n.o. in SEO and the magic beans
on Thu April 8th

silky in Right way, wrong way
on Fri February 19th

Categories

Technical
69 Entries

Security
18 Entries

Java
23 Entries

Privacy
6 Entries

Database
11 Entries

Internet
58 Entries

Business
31 Entries

Site Updates
19 Entries

Personal
86 Entries

RSS Feed RSS Feed

Tag Cloud

Google Analytics vs Reality
Posted : Thursday December 2nd, 2010

Like many others I am using Google Analytics on many sites to track traffic, and have generally been happy with it. Recently though I have encountered an issue that is making me seriously reconsider it.

In short the issue is that there appears to be a truly massive gulf between what Google Analytics is reporting and reality.

What is happening in my case is that I have a site that is running Analytics on all it's pages. This site also has a number of "special" content pages, to get a general idea the basic site consists of about 50 "regular" pages and about 750 special ones and the traffic split between is about 40 to 60. At any rate for these special pages I am doing my own logging and tracking because views and interactions and statistics on them is important for me to have close at hand. And here is where I found the discrepency.

Now the basic analytics report for November for the site claims about 25,000 page views. But my tracking and logging for the special pages alone for November shows 44,000 views! That's a huge difference and only worse when you remember I am not even tracking all the pages on the site. Based on what both Analytics reports and what I suspect in terms of the ratio of basic to special pages views, if my recorded numbers are extrapolated to the whole site I get a November total page view count of 73,000.

When I first discovered this I was shocked, Google can't really be losing 50,000 page views can it?

Well following the first rule of good development (strange bugs are your fault) I assumed the problem was in my code and began rexamining exactly what is being logged and when. I reviewed exactly what my code is doing and the specifics of what I am logging. The problem is not at my end. I compared results for a specific special page that I log to Google's and it's clear Google is simply not seeing (or recording) all the requests that I am. In one example I saw a special page that had 18 views by my count over a 2 day period. By Google's count that same page, over the same time period had 6 views. It's a pattern that's fairly consistent across the board, in all cases Google has less page views then I do, sometimes by a small number but in many cases a large gap between them.

So what is happening? Well to cover off what it is not first

  • I am filtering Analytics and not seeing all traffic. No I checked, I am not doing that.
  • I am looking at unique page views in Analytics and not totals. Again no and I did check.
  • I am logging more views because users are refreshing in the middle of a page load (or similar behaviour leading to "false" views). Again no. I log the time and other information about the requests and when I look at the pattern it looks like normal usage. For the example listed above I can see 17 different "users" looked at that page a total of 18 times and these requests are spread out throughout the day.
So that leaves me wondering about bots and users who have disabled either JavaScript or Analytics specifically. I do see some bot requests certainly, again to return the example above there is one confirmed bot in the 18 and another two I think might be. But again let's just say for arguments sake that those three are all bots and that I am missing another three as well (just to be on the safe side, this of course makes the assumption that a full third of the traffic to my site is coming from bots but we'll go with that for now). So that gives us the following.

My total logged requests18
Subtract user who viewed page twice-1
Subtract known or suspected bots-6
My adjusted total views11
Google Analytics6

As you can see even with the adjustments which I am fairly sure are too high we are still left with Analytics tracking barely half the actual traffic. And that leaves us with the possibilities that eitherhalf the site users have JavaScript disabled or blocked or Analytics is in some way pretty broken. I don't really know what it can be. The JavaScript or blocking seems to be the most palatable alternative left but I find it difficult to think that a full 50% of the audience for this site would fit into that category. I guess there is a possibility that because of the content there are really bots swarming the site at all times and that is the difference but again the usage patterns I see don't look like that and really the idea that a site with 25,000 page views is getting swamped by another almost 50,000 views from bots in every month is a bit incredulous.

The one interesting thing I have realized though from this is that may help explain some other odd behaviour I have noticed in terms of traffic as reported by Google. I have noticed for some time that the number of active sessions for the site at any one time sometimes exceeds the total number of visitors that Google reports on the site. Even when that doesn't happen though the number of visitors by session at one time for peak hours (from 8 to 5) will represent 1/4 to 1/3 of the total Google reports which is also suspicious. I mean it's not too likely that users are spending 2, 3 or more hours on the site at any one time. But if Analytics is truly missing 1/2 or more of the site traffic then those numbers start to make more sense.

I am now doing more extensive logging and tracking on my end so that I can better identify and elminate bots as well as compare my results more closely with Google's. I am hoping by Monday to have enough data to perform a meaningful analysis and maybe get a better handle on what exactly is going on. We shall see. It really is... odd.

Tags

Analytics  Google 

Categories

Internet 

 
   
  Follow me on Twitter   My Facebook Profile   My LinkedIn Profile   RSS feed of my blog Home   |   About   |   Blog   |   Stuff   |   Contact   |   Privacy Policy  
   
  © 2008 Max Stocker