Tuesday, August 12, 2008

A basic hit counter using PHP and SQLite 3

SQLite is a lightweight database engine that is bundled with PHP 5. Using PHP Data Objects (PDO) you can create and edit SQLite databases and tables. A SQLite databases is just a single file that is stored in a folder that has modify rights applied to it (otherwise you can't change it).

This sample shows how to create a database, add a table to it and insert and update a record in it. By using PDO, you can easily change it to use different database engines (like MySQL, IBM DB2 etc [others, like MSSQL and Oracle are in a experimental stage though, check the PDO site (linked above) for more information]).

You can copy this code into a separate file and include it within pages that you want a counter on <? include("counter.php"); ?>.

<?
// logging page hits
$dbfolder = $_SERVER["DOCUMENT_ROOT"]."/data/";
$dbname = $_SERVER["HTTP_HOST"]."_log.sq3";

// check if database file exists first
if(!file_exists($dbfolder.$dbname))
{
 $logdb = new PDO("sqlite:".$dbfolder.$dbname);
 $logdb->exec("CREATE TABLE hits(page VARCHAR(255) PRIMARY KEY, counter INTEGER)");
}
else
{
 $logdb = new PDO("sqlite:".$dbfolder.$dbname);
}

$page = $_SERVER["SCRIPT_URL"];

// check if page is already in the hits table
$statement = $logdb->query("SELECT counter FROM hits WHERE page='$page'");
$record = $statement->fetchAll();

// if a record is found
if(sizeof($record) != 0)
{
 $counter = $record[0]['counter']++;
 $logdb->exec("UPDATE hits SET counter=$counter WHERE page='$page'");
 echo "Counter: ".$counter;
}
else
{
 $logdb->exec("INSERT INTO hits(page, counter) VALUES ('$page', 1)");
 echo "Counter: 1";
}

// close connection
$logdb = null;
?>

In writing this code, LiteWebsite was a valuable resource, containing many SQLite examples.

6 comments:

NickPeirson said...

There are multiple race conditions in this snippet, which would lead to inaccurate counts on a busy site.

One running instance of the script could be checking for the existence of the file, not find it, causing it to start creating the file. A second instance could check for the file existence before the file was created by the first instance, but after the first instance had already checked, ending up with both instances trying to create the file.

A similar condition occurs when setting the initial hit for a page vs incrementing the hits for that page.

If accurate hit counts are critical, you should not use this snippet. Admittedly you'd only lose a couple of hits at most, however I've seen similar code used in conjunction with a date field, to get hits per day, which exacerbates the problem by making it possible to lose a couple of hits per day, rather than just on the initialisation of the page. These conditions become more likely to arise on a busy site, or slow/overloaded server.

I've not worked with SQLite much, but in MySQL you'd get round this by using INSERT ... ON DUPLICATE KEY UPDATE syntax, and make sure the table was already created prior to making the script live.

Sam said...

There may be race conditions, but at least for the creation of the database it will only happen if two people visit the site within milliseconds of each other (or however long it will take to create the database).

The update may cause an exception if the record already exists, but I expect the code could be tweaked to catch it and use update instead (although exception handling has an impact on performance).

As for the inaccuracy of hits, SQLite implements locking (File Locking And Concurrency In SQLite Version 3), so would that help prevent that?

Does anyone know of any (free or open source) tools that can be used to test a site (e.g. 'X' hits per second for a minute, where 'X' could be any value you define)? Then you would test the database and see if the hits it records is the same as the number of requests made.

NickPeirson said...

Unfortunately file locking doesn't help, because the file isn't locked between the query to get the number of records and the insert to update the records. A user could come in, get 10 hits back from the db, another user could come in in the mean time, also get 10 hits, and they'd both update to 11. The solution is to change the update to somethign along the lines of:

UPDATE hits SET counter=counter+1 WHERE page='$page';

In this case, the locking does prevent the race condition, as the incrementation is done within the db lock.

This is something I've come across in the past, highlighted by a comparison between web logs and in-page counters. Needless to say, marketing were pretty unhappy that their figures were out. This came to light on a website that was doing huge numbers of sales and was especially pertinent as we were paying an SEO company on conversion from none tracked/affiliate traffic (Low hits / high sales = great conversion!).

It works well as a quick and dirty solution, to give a rough idea of hits, but on a busy site it would definitely be inaccurate. However, if it was on a busy server, and the hits were critical, you'd want to be tracking things like unique hits (by cookie, ip, etc). It's possible to spend quite a lot of time trying to get an accurate hit count, in the above situation we decided it was more economical to use a log analyser (sawmill, in this case) to produce stats.

Sam said...

This isn't really intended for a busy site as it is just a basic hit counter. If accuracy is needed, you can query your server logs or use something like Google Analytics (plus you would get much more information).

MySQL or another database would be more appropriate for busy sites - the connection string would change and you would create the table/database before hand (PDO can still be used for MySQL).

Hit counters aren't used much now, but occasionally you may get requests for them (usually on a site that would not get that many hits anyway).

Dan said...

I think that
$counter = $record[0]['counter']++;
should be
$counter = ++$record[0]['counter'];

Nick said...

To be honest, the best solution is to use
$counter = $record[0]['counter'] + 1;

If you use either the pre or post increment operator you're relying on operator precedence, which is ill advised. This situation may be relatively trivial, but in general relying on operator precedence causes issues.

Either way, I wouldn't recommend using this snippet, as there is still the aforementioned race condition, caused by incrementing in a way that doesn't update the stored data in an atomic operation.