I decided I wanted some stats. There are a few options: Use a service (Google Analytics, etc) or parse your logs. Both have pros and cons. This article isn't supposed to help you decide.

I just wanted simple stats based on logs: It's non-intrusive to visitors, doesn't send their browsing habits to third parties (other than what they send themselves), and uses the apache log data I've already got for the entire year.

I'm mainly interested in seeing how many people actually read these articles, as well as what search terms referred them here.

Fix your logs

I've got seven virtualhosts spread across four virtual machines. My first problem, is all were using /var/log/httpd/access_log for logging. A lot of grep work, and I managed to split those out to individual access logs: /var/log/httpd/access_log.chrisirwin.ca, for example.

My biggest problem is a lot of log enteries didn't actually indicate which virtualhost they were from. I ended up spending a few hours coming up with a bunch of rules to identify all queries for my non-main virtualhosts (yay static files). Then I dumped anything that didnt' match those rules into my main virtualhost's log (including all the generic GET / entries.

All my logs are sorted into per-virtualhost logs, and all lines from the original are accounted for.

I renamed access_log to access_log.old, just so I don't mistakenly review it's data again.

Fix your logging

Now that we've got separate access logs, we need to tell our virtualhosts to use them. In each virtualhost I added new CustomLog and ErrorLog definitions, using the domain name of the virtualhost.

CustomLog       "logs/access_log.chrisirwin.ca" combined
ErrorLog        "logs/error_log.chrisirwin.ca"

Then restart httpd

$ sudo systemctl restart httpd

I also disabled logrotate, and un-rotated my logs with zcat. I'll probably need to revisit this in the future, but 1 year worth of logs is only 55MB.

Fetch logs

It goes without saying that awstats needs to be local to the logs. I have four virtual machines. Do I want to manage awstats on all of them? No.

So I wrote a bash script to pull in my logs to a local directory:

$ cat /opt/logs/update-logs 

cd $(dirname $(readlink -f $0))

# Standard apache/httpd hosts
for host in chrisirwin.ca web.chrisirwin.ca; do
    mkdir -p $host
    rsync -avz $host:/var/log/httpd/*log* $host/

# Gitlab omnibus package is weird
mkdir -p $host
rsync -avz $host:/var/log/gitlab/nginx/*log* $host/

Now I have a log store with a directory per server, and logs per virtualhost within them.

Configure cron + ssh-keys to acquire that data, or run it manually whenever.

Install awstats

Then I picked my internal web host, and installed awstats. This is in Fedora 22, but requires you to enable epel for CentOS/RHEL.

$ sudo dnf install awstats

And, uh, restart apache again

$ sudo systemctl restart httpd

Configure awstats

Now go to /etc/awstats, and make a copy of the config for each domain:

$ sudo cp awstats.model.conf awstats.chrisirwin.ca.conf

You'll probably want to read through all the options, but here's all the values I modified:

# DNSLookups is going to make log parsing take a *very* long time.
# My site is entirely https, so tell awstats that

Run the load script

Let's just piggy-back on provided functionality:

$ time sudo /etc/cron.hourly/awstats

Mine took >15 minutes. I think it was primarily DNS related.

Review your logs

By default, awstats figures out what config to use based on the domain name in the URL. However, I've aggregated my logs to a single location. Luckily, awstats developers though of this, and you can pass along a an alternate config in the url:


Tweaks to awconfig

Unless you're running awstats on your localhost, you'll be denied access. You'll likely have to edit /etc/httpd/conf.d/awstats.conf and add Require ip, or whatever your local ip range is. Note that while you can add hostnames instead of IPs, reverse DNS needs to be configured.

While there, you could also add DirectoryIndex awstats.pl.