My Custom Stats Package
<< February | March | April >>
Tuesday, 29th March 2011
For a long time, I relied on web statistics from packages that trawl the HTTP access logs on my web server. They analyse the access logs and convert all of the requests into statistics. This is alright, but it isn't a very good way of analysing your traffic.
The first problem with your HTTP access logs is that they get overrun with information from web crawlers. If you have Google, Bing, Yahoo (especially Slurp) and countless other robots crawling your website (sometimes in disguise) they dramatically skew your web statistics. This is reason number 1 for HTTP access logs over-inflating your numbers.
The second problem is that these tools often report the statistic I hate the most: hits. A "hit" is exactly one request. Each page, image, JavaScript file and CSS file each count - so by visiting one page, you could easily register 90 hits for a single page view. (I'm not joking, I picked a website from my favourites bar at random and counted the number of requests). So on this basis, anyone who tells you they get 5,000 "hits" a month might only be serving up 500 pages to 3 users (2 of them are robots and the the third is them checking their own website).
The third problem is speed. Parsing a busy HTTP access log is just plain slow.
It was for all of these reasons that I originally wrote a simple statistics package, which was included in the Swift Point Content Management System. The package solves all of these problems.
Firstly, it uses a simple technique to eliminate most robot traffic - it uses JavaScript to collect the statistics and standard robots don't run JavaScript when they crawl websites. This cuts out all the noise and gives you numbers from the important people - your human visitors.
Secondly, it counts page-views and unique visits rather than "hits". This means you can see how many people came to your website and how many pages they all viewed.
Thirdly, because it processes the numbers quietly in the background, it doesn't need to perform a big parsing operation to display the numbers - so its fast. So as a simple statistics package it is rather useful for seeing how much traffic you really have. It has the following breakdowns...
- Monthly Pages
- Monthly Visits
- Top Content (Most popular pages)
- Bottom Content (Least popular pages)
- Hourly Usage (Popular times of day)
- Weekday Usage (Popular days of the week)
Back in September last year, I decided to take things up a notch and wrote an extended version of the statistics package, with a view to handling larger traffic volumes and also with the intention of finding out more information about what people are interested in.
I wanted to find out more about the pages people were interested in and where people came from. In particular, if I spotted a spike in my traffic, I wanted to find out what caused the sudden increase and what pages were in the traffic party. To this end, I wrote an industrial version of the statistics package and plonked it on this website to try it out and see what I might learn. The new statistics view had some additional features:
- Daily pages
- Monthly pages
- Daily visits
- Monthly visits
- Popular pages (all visited pages from most to least popular)
- Page referrers (where traffic came from for a specific page)
- Referrers (general top referrers)
- plus some information on browsers and screen resolutions
The first lesson I learned from this was that browsers lie. A lot. Not a single browser wants to tell you who it is - so browser statistics are largely hit and miss. I wrote an article about this, so you can read "Why Do Browsers Have To Be Such Big Fat Liars" for more on that. The bottom line was that I decided to ditch browser statistics. Suffice to say, a lot of people are using Internet Explorer (6, 7 and 8), Firefox (3) and Chrome and a few people are using Safari and Opera.
The next lesson I took from this project was that knowing about who refers you traffic is a great idea. Instead of being told by a friend that you've been featured in an article on Smashing Magazine, you can see it appear in your list as a big source of traffic. You can also see more general information on the root domains that send you people as well as more specific information on the exact pages that send people to a particular page on your website.
As a trend, I noticed that specific subject on my website, such as JavaScript and jQuery get a lot of traffic from other websites (i.e. not from search engines). In particular, jQuery and JS Plugins are a massive and constant source of traffic.
The search engines came into their own for some of the blogs I've posted on specific subjects, such as technical articles. The search engines were finding specific answers for specific searches and give a steady stream of traffic.
And finally, occasional mentions in online magazines give sudden spikes of traffic, easily doubling the normal daily traffic for the first day or two and then falling away over time to just a gradual flow.
The best bit about all of this is that I decided to install this package on The Mag to help analyse the traffic and it took all of 5 minutes to install. So not only is this the best web statistics tool I've ever installed, it's also one of the easiest! I don't know what the future holds for the "Swift Stats Pro" project, maybe I'll open-source it once it has been running for a year or so!