Ben's web programming pages: HTTP headers

Page index

HTTP Headers
Last-Modified header
Expires header
- Cacheing
- Not Cacheing
Content-Length header
Content-Encoding header
Status header
Content-Type header
Location header

HTTP Headers

HTTP headers are just lines of text sent from the server to the browser before the page content. They are terminated with a blank line, after which the page content is deemed to begin.

For static content such as plain HTML pages the web server itself will generate HTTP headers automatically, such as the "Last-Modified: " header.

Scripts, however, which produce dynamic content ought to generate these headers themselves. Of course, this is not compulsory, but generating the correct headers for a page has advantages, and generally contributes to the smoother-running of the internet. Sending the right HTTP headers will certainly enhance viewers' experience of your website.

In PHP the header() function can be used to ease the generation of HTTP headers. In Perl you just output them as plain text before any other output, terminating each header with a newline, and leaving a blank line after the last header.

For reference, all the currently standardised HTTP headers are described in RFC2616, which defines HTTP/1.1. The relevant part is section 14.

Last-Modified header

Consider a normal static web page on this site. If a browser has seen the page before and has a copy of the page in its browser-cache, then when it requests the page again at a later date it will do so with an "If-Modified-Since:" header. The web-server will compare the date supplied by the browser and the date stamp of the local page to determine whether the browser's copy of the page is still current. If it is still current, the server sends only the response "304 Not Modified" which is very fast and lightweight. The browser then uses its cached copy of the page. Alternatively, if the server finds that the page is out of date (ie. the "Last-Modified" time of the page is more recent than the "If-Modified-Since" time provided by the browser) then the entire page is sent as normal.

For a static page the "Last-Modified" time is just the timestamp of the HTML file. But for a dynamically generated page the server cannot assign a last modified time: it has no way of knowing what kind of data you are generating, and how it might change from access to access, so the server doesn't generate the header at all. It is the responsibility of the programmer to generate a "Last-Modified:" header for dynamic pages.

My sermon pages are generated on the fly by a PHP script. I want to give these an accurate "Last-Modified:"header for the reasons discussed above. There are two factors which might affect the freshness of the pages generated by the script: the last time I edited the script itself, and the last time I added a new sermon (this can be done without editing the script). So the PHP script generates a last modified time as follows,

# The Last-Modified time is the newer of this script's modification time
# and the sermon contents file modification time
$this_mtime = filemtime($_SERVER['SCRIPT_FILENAME']);
$serm_mtime = filemtime("sermons/$file.sermon");
$mtime = ($this_mtime > $serm_mtime) ? $this_mtime : $serm_mtime;
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime) . ' GMT';
header("Last-Modified: " . $gmt_mtime);

My ISP's webserver intercepts this header, and if it is the same as the browser's "If-Modified-Since" header the the page is not sent. (Incidentally, note that the "Last-Modified" time should never be older than the "If-Modified-Since" time.) It is possible that your web server will not do this automatically, in which case your script should handle the "If-Modified-Since" header itself and return either the page or the "304 Not Modified" response.

Expires header

Cacheing

If you want to ensure that a page is cached by the browser and/or intermediate web-caches then it should have a valid "Expires:" header. When a browser accesses a page in its cache with an expires time in the future then it won't even bother contacting the web-server, it will just use its own copy: it is allowed to assume that the content hasn't changed. Of course, you need to take care with this: if you set an expires time a year into the future and then change your website, your viewers potentially might not notice your changes until the year is up.

For example, my M'Cheyne Bible calendar gives daily Bible readings. These change at midnight (local-time) and are valid for 24 hours. To save unnecessarily downloading the whole thing again when people visit the calendar multiple times in a day I make use of both the "Last-Modified" and the "Expires" headers. (The former is probably unnecessary, but I send it for completeness.)

# Last modified time is previous midnight (local time)
$mtime = mktime(0-$tz,0,0,$t['mon'],$t['mday'],$t['year']);

header('Last-Modified: '.date('D, d M Y H:i:s', $mtime).' GMT');

# Expires time is 1s to next midnight (local time)
$etime = mktime(23-$tz,59,59,$t['mon'],$t['mday'],$t['year']);

header('Expires: '.date('D, d M Y H:i:s', $etime).' GMT');

This strategy seems to work quite effectively, and has significantly reduced the download traffic from my website. And therefore it's faster for users as well.

Not Cacheing

It is sometimes useful to send an invalid "Expires" header, either in the past, or zero. In this case the browser and intermediate web-caches should not cache the page at all, which is what you need for pages that could change with every access, such as the output of forms.

For more information on cacheing see the CacheNow! web page. Also, there is a useful Cacheability Tester. It will examine your Last-Modified header and Expires header among others and report on how cache-friendly your pages are.

Content-Length header

The "Content-Length" header is basically a way of being kind to browsers. For very small image files, such as the little pictures generated by my GIF server, apparently the web-server can combine several images into one IP packet to be sent to the browser if it knows their lengths. This is an efficient use of bandwidth. (Actually, my Apache server seems to manage this without the header, but it can't do any harm.)

In addition, browsers using HTTP/1.0 to communicate with the server can only use "Keep-Alive" connections if they have the "Content-Length" header. These are desirable as they can considerably reduce page-loading latency.

Finally, "Content-Length" headers will further make your dynamically-generated pages look more like static pages, which always have one.

The difficulty with generating this header is that all HTTP headers must be sent before the page content, but you don't know the length of the content until after it has been generated. The solution, in PHP, is to use Output Buffering. The page is constructed in a buffer before it is sent. The content-length is then the size of that buffer, and the header can be sent before the buffer is flushed. This is very simple to do.

<?php
ob_start();
?>

...output page contents as usual...

<?php
# Make a Content-Length: header
header('Content-Length: ' . ob_get_length());

ob_end_flush();
?>

Content-Encoding header

Since we're already using output buffering to provide a "Content-Length" header, we can also make use of it to compress our pages on the fly for faster downloading.

All that's needed is to specify the predefined "ob_gzhandler" in the call to "ob_start". That's it! This handler looks at the Accept-Encoding: header sent by the browser to decide whether to compress the data or not, and it sends the correct Content-Encoding header in return. The Content-Length as returned by "ob_get_length()" is correct whether the buffer is compressed or uncompressed (but see note below).

ob_start('ob_gzhandler');

Server-side compression is a very effective way of reducing bandwidth usage and speeding the loading of large pages on slower links. It might not make so much difference on a dial-up connection, though, as modems usually incorporate some hardware-compression anyway.

Note: Somewhere along the line the behaviour of ob_get_contents() has changed. It used to return the compressed length of the buffer, but now (at least in PHP 4.3.3) always returns the uncompressed length. If you use this value in the Content-Length: header then it can cause problems for browsers, for example causing them to hang temporarily. One workaround is to double-buffer your output. The inner buffer compresses, the outer buffer deals with the headers and output:

ob_start();
ob_start('ob_gzhandler');

  ... output the page content...

ob_end_flush();  // The ob_gzhandler one

header('Content-Length: '.ob_get_length());

ob_end_flush();  // The main one

Status header

It might happen that you want to return an error code from a script, such as "304 Not Modified" or "404 Not Found". In PHP this is a straightforward application of the header() function. In Perl, however, you need to use the "Status:" HTTP header. Note that you also need to supply the HTML for the error page that will be displayed to the end user. I do this in my file viewer script.

print <<EOT;
Status: 404 Not Found
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
<P>The requested file is not available: either it does not exist, 
or you do not have permission to view the source code.</P>
</BODY></HTML>;
EOT
exit 0;

Content-Type header

Sometimes it is necessary to supply a "Content-Type" header from the script to help the browser to work out what to do with it. Unless a the script explicitly sets the content-type the server or the browser will make a guess based on the file extension. Since most of your scripts will have extensions like .php or .pl or .cgi this is not very accurate.

For example, I set the content-type explicitly in the M'Cheyne Server PHP script to indicate that the contents are XML, and browsers that understand XML should render it as such.

header('Content-Type: text/xml');

Another example is the GIF server, which is a Perl script. A GIF file should be served with content-type image/gif. To make sure of it I set it explcitly,

print "Content-Type: image/gif\n";

Location header

A "Location" header performs a so-called server-side redirect. If a client accesses a page X.html and the server responds with Location: Y.html the client browser will automatically open page Y.html instead.

I use this feature in a Python CGI script for sending email. If the email was sent successfully it redirects to one page, if there was a problem it redirects to another page. So the only output from the script is generated at the end by the following line:

print ("Location: http://%s/%s\r\n" % (host, success and 'email_sent.html' or 'email_fail.html'))

So, a user fills in an email form on the website. This form specifies that an email CGI script will handle the data, so when the form is submitted the browser accesses the script. On being accessed this script tries to send email (according to the contents of the form). If sending the email was successful then a Location: http://example.com/email_sent.html header is returned and the browser will finally open that page. If sending the email was unsuccesful then a Location: http://example.com/email_fail.html header is returned and the browser will open that page instead.

Note that for correctness the text of a location header should be an absolute URI—ie. starting with http://—although a relative URI might work.

Ben's Web Programming Pages