LinuxFest Northwest 2010: Apache Rewrites

Apache Rewrites — for Ninjas!

This session is an overview of using Apache mod_rewrite, driven mostly by your interests and questions.

Your Questions, Your Interests?

The simple difference between rewrites and redirects.

You can specify an redirection in a few ways, and they don’t need to be RewriteRules. You can use an Alias directive to map to the file system or a Redirect directive to bounce hosts:

# mod_alias
Alias /newspaper /home/newspaper
AliasMatch /(newspaper)/(*\.htm) /home/$1/today/$2 Redirect permanent ^/xml/ http://xml.news.com/ RedirectMatch permanent ^/xml/(.*)\.xml$ http://xml.news.com/$1.xml

Why to NOT use Aliases and Location directives?

Access control features might be the first thing to consider. If you want to configure the security of a directory, then you probably want to use Directory or Location directives.

Otherwise, consider maintainability, really.

The order of operations:

  • <Directory>
  • <DirectoryMatch>
  • <Files>,<FilesMatch> << rewrites
  • <Location>,<LocationMatch> << aliases
  • <VirtualHost> << repeat the above order inside VH’s after global scope

See http://httpd.apache.org/docs/2.2/sections.html for this example of A,B,C,D and E in order:

<Location />
E
</Location>

<Files f.html>
D
</Files>

<VirtualHost *>
<Directory /a/b>
B
</Directory>
</VirtualHost>

<DirectoryMatch "^.*b$">
C
</DirectoryMatch>

<Directory /a/b>
A
</Directory>

Aliases and Location directives first and File directives second. This means that mixing and matching Location and File operations can introduce confusion or flaws because the a Location directive is out of sight of your rewrite rules, and you start hair-pulling because you can’t figure out why your intended rewrite is not even being reached. Debugging rewrites and Location directives is not so easy, there’s no step-wise debugging.

Usually, a rewrite is a transparent transformation of the URI into another path. You’re given a wonderful array of internal variable and re-entrant processing capabilities to use. For example, if you wanted to cache popular content that a cron job created, you could match it and look it up on the filesystem before going to your php script:

# Static Local Location
RewriteCond %{DOCUMENT_ROOT}/$1/$2 -f
RewriteRule ^/book/(.+)$ %{DOCUMENT_ROOT}/$1 [L]

# NFS Location
RewriteCond /home/books/$2 -f
RewriteRule ^/book/(.+)$ /mnt/webdata/ebooks/$2 [L]
# bounce this request to new.news.com if static version is not present
RewriteRule ^/ebook/(.+)$ http://new.news.com/prweb/ebook/$2? [L]

Skin That Cat Many Ways

Starting off with Alias and Location directives is fine. However there are a surprising number of places that Apache functions overlap. I presume this is done mostly because it’s difficult to switch between apache modules to accomplish these goals.

* example *

<Location /newsimage.gif>
ForceType  application/x-httpd-php
</Location>

RewriteRule ^/newsimage.gif$ - [T=application/x-httpd-php,L]

ScriptAlias /newsimage.gif /website/newsimage.php

Apache Directive Processing is not a Programming Language

Apache processes data thru the directives in a rather obtuse fashion. environment variables and pattern matches are not globally visible. Pattern matches between the directives are effectively blocked across directives. Examples:

<LocationMatch /news/today-(.*)>
RewriteRule ^(.*)$  -   [L,F]
</LocationMatch>

Since the regex matches found in RewriteRules share a different memory scope in the Apache process than the other core directives. You’re not given a programming language withing the apache.conf files. The only way to convey data between directives is by setting environmental variables using SetEnvIf or RewriteRule [E] directives.

RewriteRule ^/news/(today-.*)$ - [E=baddate:1]
<Directory /oldnews/* >
Order Allow,Deny
Allow from All
Deny from env=baddate
</Directory>

Introducing RewriteCond

We can incorporate host name, query parameters and other parts of the request header, and environmental variables into the rewrite using RewriteCond directives.

RewriteCond HTTP_HOST ^(.*)\.news.org$
RewriteRule ^(.*)$ http://%1.news.com/$1? [R=301,L]

Chaining Rules

Rewrite conditions are important because they are the way you can pull in data from the header and query string, which is pretty common. However, if you have a series of rewrites that don’t require things like the query string, it is often easier to chain rules. Rule chaining is easier to read and are processed in order.

RewriteCond HTTP_HOST ^(.*)\.news\.co\.uk$ [OR]
RewriteCond HTTP_HOST ^(.*)\.news\.co\.ch$ [OR]
RewriteCond HTTP_HOST ^(.*)\.news\.co\.nz$
RewriteRule .* - [E=server:ww1.news.com]

RewriteCond %{ENV:server} != ""
RewriteRule ^(.*)$      http://%{ENV:server}.news.com/$1

Making Things Forbidden

There are a few ways to gaurd against bots and what, often by checking for suspicious user agents or referrers. These lists can get very long.

RewriteCond %{User-Agent} ! .*google.*
RewriteCond %{REQUEST_URI} ^/sitemap
RewriteRule .* - [F,L]

Homebrew Cached Output

Consider a batch process that pre-generates static html for frequently hit content that’s originally generated by a PHP script.

RewriteCond %{QUERY_PATH} ^/today/(business|world|entertainment)/(.*)$
RewriteRule .* - [T=application/x-httpd-php]
RewriteRule .* %{DOCUMENT_ROOT}/%1.php?q={%2} [L]

Otherwise you could use LocationMatch and ForceType to execute the php script. You would be using fall-through from Files directives to LocationMatch directives to do this.

Proxying Internal and External Content

We’ve seen how easy it is to redirect. We can also proxy those redirections (and rewrites) using the [P] directive.

Little Performance Tip

Turning Etags off to mask backend servers can make the results last in cache longer.

Check AskApache.com for other speedup tips.

Don’t Forget to Back Up!

Who am I?

Jed Reynolds has been an IT pro since 1996. He recently completed his first year of car-free commuting — traveling 2500 miles on his bicycle. He also loves his Pentax K10D.

Advertisements

2 thoughts on “LinuxFest Northwest 2010: Apache Rewrites

Comments are closed.