[SOLVED] Last-Modified and ETag implementation

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

[SOLVED] Last-Modified and ETag implementation

Post by Chris Corbyn »

Does anyone know, in simple terms what a client sends in it's HTTP request when making use of the ETag header, then what the server does with that value? Is it something like this?

1. Client sends first request for file
2. Server gets "checksum" for that file, sends it in the ETag field followed by the file
3. Client caches that ETag
4. Client request file at a later date and sends the ETag in the request
5. Server sees that the ETag matches so doesnt send entire file back again

The Last-modified header is simple as far as I know:

1. Client requests file for first time
2. Server sends Last-modified time with the response
3. Client caches the last-modified time
4. Client request file at a later date but sends a If-Modified-Since header
5. Server compares dates and only sends full file if that date is newer.

This is basically for a little part of a Java servlet I'm writing which intercepts geniune request for files.

The main thing that I'm not clear about is what header the client sends when it wants to compare with an ETag.
Last edited by Chris Corbyn on Mon Aug 20, 2007 11:01 am, edited 1 time in total.
User avatar
The Phoenix
Forum Contributor
Posts: 294
Joined: Fri Oct 06, 2006 8:12 pm

Re: Last-Modified and ETag implementation

Post by The Phoenix »

d11wtq wrote:Does anyone know, in simple terms what a client sends in it's HTTP request when making use of the ETag header, then what the server does with that value? Is it something like this?
Close.
d11wtq wrote:1. Client sends first request for file
2. Server gets "checksum" for that file, sends it in the ETag field followed by the file
#2 can vary. Sometimes it is based on a checksum. Sometimes it is based on file modification date. Sometimes its whatever the maniac coding their own etag solution in PHP comes up with. The key is repeatability. As long as the function is unique and consistent (which is why hashing functions like md5 are often used), its fine.
d11wtq wrote:3. Client caches that ETag
4. Client request file at a later date and sends the ETag in the request
5. Server sees that the ETag matches so doesnt send entire file back again
#5 also sometimes varies. Thats the ideal state, but sometimes its a proxy between the server and client that does that. It can also be that the coder creating the ETag solution could just serve content despite it (there are a few rare cases where it can be useful).
d11wtq wrote:The Last-modified header is simple as far as I know
The only trick to last-modified is that the *source* of the last modified date can vary, and can be relative to server time OR client time, depending on the configuration. Many times that has bitten me.
d11wtq wrote:The main thing that I'm not clear about is what header the client sends when it wants to compare with an ETag.
Firefox plus live headers will give you a better view into that portion of the process.

Another thing to look at is Yahoo's YSlow (a plugin for Firebug, which in turn is a plugin for Firefox). It talks at length about why ETags can actually be a negative thing in a large number of cases. Yahoo actually recommends against using ETags in general.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I figured this out just about 3 minutes before opening this post. I'd seen the header name plenty of times but never actually made the link betwen ETag and it. The header the client sends is "If-None-Match" and it should contain the ETag *exactly* as it was sent by the server, including the double quotes :)

I'm just going to use the Last-Modified-Date with some quotes around it I think. I may take a hash of the source too, but it would just be a pretty weak hash (no a md5 or something). Something 10-15 characters long :)

It's actually quite interesting playing around with this stuff. I may implement this into dynamically generated content at some point. Take this forum for example, if you know that nobody has posted anything since the last request, why send back the entire page? It's a trivial task too.
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Post by Oren »

It doesn't just send back again the entire page... It probably executes all the PHP and all the DB queries again :?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Oren wrote:It doesn't just send back again the entire page... It probably executes all the PHP and all the DB queries again :?
Yeah sorry, I'm referring to bandwidth usage here. If your content hasn't changed, then why waste valuable bandwidth by sending the markup for the page they already have seen? ;)

Here's my implementation (just requesting a small CSS file here.... my ETag validation is weak but it suits for now).

First page request, content served up (200 OK):

Code: Select all

chris-corbyns-computer:~/Java/webapps/personal/trunk d11wtq$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /css/main.css HTTP/1.1
HOST: personal
Connection: close

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Last-Modified: Mon, 20 Aug 2007 15:08:15 GMT
ETag: "Mon,20Aug200715:08:15GMT"
Content-Type: text/css
Content-Length: 189
Date: Mon, 20 Aug 2007 15:20:05 GMT
Connection: close

/** General **/
/*-----------*/

ul,ol,li,h1,h2,h3,h4,h5,h6,pre,form,body,html,p,blockquote,fieldset,input {
  margin: 0; padding: 0;
}

p {
  margin-bottom: 1em;
}

body {
  color: red;
}
Connection closed by foreign host.
chris-corbyns-computer:~/Java/webapps/personal/trunk d11wtq$
Second page request (304 Not Modified) -- I've content already cached with an ETag:

Code: Select all

chris-corbyns-computer:~/Java/webapps/personal/trunk d11wtq$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /css/main.css HTTP/1.1
HOST: personal
If-None-Match: "Mon,20Aug200715:08:15GMT"
Connection: close

HTTP/1.1 304 Not Modified
Server: Apache-Coyote/1.1
Last-Modified: Mon, 20 Aug 2007 15:08:15 GMT
ETag: "Mon,20Aug200715:08:15GMT"
Date: Mon, 20 Aug 2007 15:21:37 GMT
Connection: close

Connection closed by foreign host.
chris-corbyns-computer:~/Java/webapps/personal/trunk d11wtq$
If that was not just a base for a CSS file and it was an image or a large HTML file, 10,000 people request it in one day, 9,000 of those already have the latest version, save your bandwidth ;)
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Post by Oren »

Yep 8)
Post Reply