Getting visitor statistics is always a problem - and for mu, especially, as you could have hundreds/thousands of blogs to track!
I am putting together a plugin, with supporting infrastructure, so that each blog admin in an mu farm can access their statistics in a secure and efficient manner. I am doing this using the basic core of awstats.
awstats is a very popular open source log analyzer package. It has several advantages:
- it is fairly widely used.
- it seems to be in active development.
- it indeed parses log files and collects statistics, with a variety of options.
- there are several plugins and enhancements available.
- lots of visitor data is presumably being already collected by the webserver (apache, usually), so getting information from the server logs makes sense. There is no need to add any special log-collecting javascript code in your pages, for example.
And awstats has several disadvantages:
- it is written in perl (well, to some this is disadvantage as far as using it directly in wordpress is concerned)
- parsing/processing logs are a hassle from many perspectives - it is slow, cumbersome, and subject to log rotation issues, etc.
- parsed data should really be stored in a database for maintaining histories; awstats stores the data in flat files, which are somewhat inconvenient and inefficient.
- awstats output processing is rigid and suffers from some specific deficiencies. For instance, rather than showing a sliding window of months or days, it uses arbitrary boundaries - it is Jan 2, 2008 today, but for the months, it only shows Jan 2008, no Dec 2007, Nov, etc. This isn’t ideal.
Anyway, I chose to use awstats (6.7), and decided to leave it unpatched and use as is. Any changes that I wanted - in either output, or for statistics windowing, would be done by auxiliary programs. Thus, should there be any updates to awstats, I can theoretically drop the new version into my system unchanged.
And since processing of both: a)log files and b)the resultant data files to display nice charts and graphs is slow, in order to be scalable for hundreds/thousands of mu blogs, I decided to run a batch process in the middle of the night every night, to create actual html pages (or, more accurately, page pieces) - so when someone goes to view their stats, the runtime php processing is minimal.
So the way the system works is this (warning: lots of moving parts!):
- The system is designed to work with a mu system where each blog has a separate subdomain under a single master domain.
- Apache log format is modified to contain the actual host as the first item of each line (using the %{Host}i option). This will show the subdomain. Apache is set up so that the entries for all subdomains share a single log file, but from this first item, they can be separated out by subdomain.
- There are two parts to the system:
- batch program run by cron each night
- the wordpress plugin
- Once/night the batch program is run by cron (under linux) - this program happens to be written in Python - but it reads the list of subdomains from the wp mu database, and for each one, sets up an environment so as to run the awstat.pl program to collect statistics for the particular subdomain, just as if the program were run standalone. Statistics for each subdomain are stored and maintained separately. Then, for each subdomain, the output portion of awstats.pl is run, so that html pages are generated containing the current day’s reports segregated by subdomains (currently in separate directories). These pages are filtered somewhat, to remove the html headers, some exteraneous links, and a style sheet is appended. They have the type “.html”, but are really pieces of the pages which wordpress will use to actually display the complete page under admin. Oh - one more thing. This program also creates special “summary” pages, so that if you are the “superuser”, you can view a summary sheet (paged if too many) containing some important statistics (with some history) for each blog at a glance. (It is for these summary sheets where I use my own data windowing - you see today’s stats, with the last seven days, and the current month’s stats with the last twelve months)
- Then, within the plugin portion (using PHP) under the wordpress admin page there is an additional menu tab: “Stats”. Under “Stats”, there is then a menu tab for each of the 20 or so “standard” awstats pages. They are presented within the admin interface, under wordpress, but other than that, if you are familiar with awstats, these should look the same. And the displaying of a statistics report page is fast - the system merely needs to assemble the pre-generated html portion with a few of the standard dynamic wordpress admin page elements - no real processing occurs at display time.
Thus, each blog user in an mu farm has access to their own updated statistic details on a daily basis, and at the same time, their data is secure from prying eyes. Stats are important if you care at all how well your blog is doing.
Of course, the problems with this approach:
- Complicated (three high-level languages are used: php, perl, python).
- Probably best to store the historical statistics in a database (easy enough to do, even using the current awstats.pl program)
- Probably best to store the reports in a database (easy enough to do….)
An alternative is to just rewrite the awstats program completely (in Python, of course), but that would remove all the advantages listed at the top. Do we really want yet another log analyzing program out there?
