disclaimer: This page is very much under development, much more so than most normal web pages.

Scaling mod_perl

Contents


Introduction

The purpose of this site is to discuss highly available, scalable, and practical mod_perl architectures, and to share the tools and techniques necessary to build those architectures.

First, though, I encourage everyone to read the mod_perl Guide, by Stas Bekman. In particular, the Scenario and Performance Tuning sections are required reading for anyone who is trying to scale a mod_perl site.

As a note, creating an infinitely scalable mod_perl architecture is not impossible; it is not even particularly rare. The following well-known sites use mod_perl:


Scaling Principles

The goal of scaling is, very simply, to eliminate bottlenecks in the application that occur as a result of resource constraints: network bandwidth, CPU, memory, disk, etc. This is typically done using one of four logical methods: That's basically it. To anyone who has worked on large apps for a while, this is all common-sensical. However, it's useful as a basic logical framework.

As a rule of thumb, vertical scaling strategies are more expensive from a hardware standpoint but cost less to maintain, and horizontal scaling strategies are cheaper from a hardware standpoint but cost more to maintain. In general, most solutions will consist of some combination of all of these suggestions: scaling vertically in some places, scaling horizontally in others, etc.


Architectural Scaling Techniques

The general principles behind scaling mod_perl sites are the same as scaling any other type of site. Essentially, the trick is to partition, cache, and queue state information in a way that preserves application integrity.

Here are some scaling techniques that can be used.

Vertical Partitioning Techniques

Horizontal Partitioning Techniques

Horizontal Scaling Techniques

Cache Persistence Management via Apache::Session

Benchmark: This benchmark measures the time taken to do a create/read for 1000 sessions. It does not destroy sessions, i.e. it assumes a user base that browses around arbitrarily and may not log out.

RESULTS: I tested the following configurations:

::::::::::::::
session_mysql.pl2
::::::::::::::
use Apache::Session::MySQL;
my %session;
# Connect to the database.
my $dbh = DBI->connect("DBI:mysql:database=apachesession;host=localhost",
"root", "", {'RaiseError' => 1});
$opts = {
           Handle     => $dbh,
           LockHandle => $dbh
};

tie %session, 'Apache::Session::MySQL', undef, $opts;

use Time::HiRes qw(gettimeofday tv_interval);
$t0 = [gettimeofday];
for (1..1000) {
$session{username} = 'epark';
$session_id = $session{_session_id};
untie %session;

tie %session2, 'Apache::Session::MySQL',$session_id, $opts;
}
print "Content-type: text/html\n\n";
print tv_interval($t0);

::::::::::::::
session_oracle.pl2
::::::::::::::
use Apache::Session::Oracle;
my %session;
# Connect to the database.
my $dbh = DBI->connect("dbi:Oracle:XXXX","XXXX/XXXX");

$opts = {
Handle     => $dbh,
LockHandle => $dbh,
Commit => 0 
};

tie %session, 'Apache::Session::Oracle', undef, $opts;

use Time::HiRes qw(gettimeofday tv_interval);
$t0 = [gettimeofday];
for (1..1000) {
$session{username} = 'epark';
$session_id = $session{_session_id};
untie %session;

tie %session2, 'Apache::Session::Oracle',$session_id, $opts;
}
print "Content-type: text/html\n\n";
print tv_interval($t0);

::::::::::::::
session_file.pl2
::::::::::::::
use Apache::Session::File;
my %session;
my $opts = { Directory => '/tmp/session', LockDirectory => '/tmp/session', Transaction => 1 };
$session_id = 1;
tie %session, 'Apache::Session::File', undef, $opts;

use Time::HiRes qw(gettimeofday tv_interval);
$t0 = [gettimeofday];
for (1..1000) {
$session{username} = 'epark';
$session_id = $session{_session_id};
untie %session;

tie %session2, 'Apache::Session::File',$session_id, $opts;
}
print "Content-type: text/html\n\n";
print tv_interval($t0);

::::::::::::::
session_dbfile.pl2
::::::::::::::
use Apache::Session::DB_File;
my %session;

$opts = {
FileName      => '/tmp/sessions/sessions.db',
LockDirectory => '/tmp/sessions',
};
tie %session, 'Apache::Session::DB_File', undef, $opts;

use Time::HiRes qw(gettimeofday tv_interval);
$t0 = [gettimeofday];
for (1..1000) {
$session{username} = 'epark';
$session_id = $session{_session_id};
untie %session;

tie %session2, 'Apache::Session::DB_File',$session_id, $opts;
}
print "Content-type: text/html\n\n";
print tv_interval($t0);

Oracle/mod_perl Techniques

Linux-Specific OS Techniques


Profiling Techniques

athenaNet Oracle Page-Level Profiler

If you have a complicated mod_perl + Oracle application and want an easy way to profile every single SELECT statement that is executing on a given page, we have developed a tool that makes this easy to do. Get DBD-Oracle-1.06-perfhack.tar.gz, which includes a hacked up version of dbdimp.c that will dump SELECT statements to a trace file. You can then run explain_dbitracelog.pl which:

Apache Techniques

Logging Production Performance

This is a trivial modification of Doug's original Apache::TimeIt script that allows you to very precisely show the Apache execute time of the page. This is particularly useful if you want to know which pages of your site you could optimize.
package AccessTimer;

# USAGE:
# Just put the following line into your .conf file:
#
# PerlFixupHandler AccessTimer
#
# and use a custom Apache log (this logging piece is not at all
mod_perl-based...
# see http://httpd.apache.org/docs/mod/mod_log_config.html)
#
# CustomLog /path/to/your/log "%h %l %u %t \"%r\" %>s %b %{ELAPSED}e"
#

use strict;
use Apache::Constants qw(:common);
use Time::HiRes qw(gettimeofday tv_interval);
use vars qw($begin);

sub handler {
    my $r = shift;

    $begin = [gettimeofday];
    $r->push_handlers(PerlLogHandler=>\&log);

    return OK;
}

sub log {
    my $r = shift;

    my $elapsed = tv_interval($begin);
    $r->subprocess_env('ELAPSED' => "$elapsed");
    return DECLINED;
}

1;

Benchmarking