Whence the page size variation?
Moderator: General Moderators
-
RobBroekhuis
- Forum Newbie
- Posts: 5
- Joined: Tue Nov 02, 2004 6:32 am
- Location: Allentown, PA, USA
Whence the page size variation?
I've noticed that page sizes for .php pages, as reported by the number of bytes in my server logs, always move up and down a little bit. But the issue got a little more of my attention when I noticed this morning that one of my main database-generated pages varies by as much as 6000 bytes (between 35000 and 41000). The database isn't changing, and the page as presented to the user isn't changing. There is no querystring, nothing fancy. So where do the size variations come from?
Please set me straight - thanks!
Rob
Please set me straight - thanks!
Rob
-
kettle_drum
- DevNet Resident
- Posts: 1150
- Joined: Sun Jul 20, 2003 9:25 pm
- Location: West Yorkshire, England
maybe its throwing error messages? check which user agent conforms to the oddly sized pages, and maybe try viewing the site w/ that browser.
you could also temporarily use output buffering to capture the output sent to the user and log it, so you can look at it
you could also temporarily use output buffering to capture the output sent to the user and log it, so you can look at it
Code: Select all
<?php
ob_start(); // put this at the very beg of the script, make sure its beforeany includes
// entires script goes here
$output = ob_get_contents(); // put this at very end of script, after all includes
// write the $output to a file
$filename = time() . '.html';
$fp = fopen($filename, 'w');
fwrite($fp, $output);
fclose($fp);
echo $output; // still gotta show it to the user or they wont like your website much
?>-
RobBroekhuis
- Forum Newbie
- Posts: 5
- Joined: Tue Nov 02, 2004 6:32 am
- Location: Allentown, PA, USA
partial answers...
To partly answer your followup questions...
No - the html content of the page is not changing (no username, date, or other variations). When I load the page twice into IE, "view source", and save as text file, the two text files have identical size. The two log entries show a different number of bytes.
So I get the size variations for my own requests - but the most extreme variations I've seen come when Googlebot requests the same page twice. Does that help in understanding the issue?
Will the output buffering approach capture more than just the page html? I think I'll give it a try, anyway - thanks for the suggestion.
No - the html content of the page is not changing (no username, date, or other variations). When I load the page twice into IE, "view source", and save as text file, the two text files have identical size. The two log entries show a different number of bytes.
So I get the size variations for my own requests - but the most extreme variations I've seen come when Googlebot requests the same page twice. Does that help in understanding the issue?
Will the output buffering approach capture more than just the page html? I think I'll give it a try, anyway - thanks for the suggestion.
no it will only capture the html
if using php5, i know how to get the headers, but not on php4.....
what you could do, is use a local script to request the page in question from your site, and spoof the diff useragents that google is using(since you know its casuing diff page sizes)
if you do that, getting the headers is a snap(and works in php4)
oh and small size variations could easily be due to setting cookies
if using php5, i know how to get the headers, but not on php4.....
what you could do, is use a local script to request the page in question from your site, and spoof the diff useragents that google is using(since you know its casuing diff page sizes)
if you do that, getting the headers is a snap(and works in php4)
Code: Select all
<?php
$ua = 'googles uagent from your log file'; // make sure you run the script for each of the diff ua's google uses
// google sometimes requests the same page more than once, and pretends to be a diff browser, to see if your sniffing for google trying to feed it keywords
ini_set('user_agent', $ua);
$url = 'http://example.org'; // your page in question
$fp = fopen($url, 'r');
$meta_data = stream_get_meta_data($fp);
echo '<pre>';
print_r($meta_data);
?>
Last edited by rehfeld on Tue Nov 02, 2004 8:01 pm, edited 2 times in total.
-
RobBroekhuis
- Forum Newbie
- Posts: 5
- Joined: Tue Nov 02, 2004 6:32 am
- Location: Allentown, PA, USA
Rehfeld,
I implemented the output buffering (by the way, the echo statement at the end is not necessary - php will clear its buffer when it runs its course). With very curious results: with the output buffering in place, I no longer get varying load sizes - they are all exactly the same, at the low end of the previously noted range. The sizes of the files written are also all the same (a few bytes smaller than the load size), and the "view source" saved version is somewhat larger (I suspect because CRs are converted to CRLFs). So the buffering suppresses the number of "extra bytes" sent. I wonder if, during processing, php or Apache sends "keep-alive" protocol bytes to let the client's browser know it's still working? Just a random guess...
I implemented the output buffering (by the way, the echo statement at the end is not necessary - php will clear its buffer when it runs its course). With very curious results: with the output buffering in place, I no longer get varying load sizes - they are all exactly the same, at the low end of the previously noted range. The sizes of the files written are also all the same (a few bytes smaller than the load size), and the "view source" saved version is somewhat larger (I suspect because CRs are converted to CRLFs). So the buffering suppresses the number of "extra bytes" sent. I wonder if, during processing, php or Apache sends "keep-alive" protocol bytes to let the client's browser know it's still working? Just a random guess...
thats a good theory. def could be happening.
but a few thousand bytes seems more than what a header or 2 would cause.
im thinking php is throwing errors.
if it was throwing 'headers already sent' errors, like when you try to start a session after sending some html, output buffering would eliminate those errors, and it would mask the problem.
but a few thousand bytes seems more than what a header or 2 would cause.
im thinking php is throwing errors.
if it was throwing 'headers already sent' errors, like when you try to start a session after sending some html, output buffering would eliminate those errors, and it would mask the problem.
ive been wanting to make something like this anyway, just hadnt got around to it.
this should help you out.
this should help you out.
Code: Select all
<?php
if (isSet($_POST['user_agent'])) {
$user_agent = $_POST['user_agent'];
} else {
$user_agent = $_SERVER['HTTP_USER_AGENT'];
}
ini_set('user_agent', $user_agent);
$html = '';
if (!empty($_POST['url'])) {
$fp = @fopen($_POST['url'], 'r');
if ($fp) {
while (!feof($fp)) {
$html .= fread($fp, 1024);
}
fclose($fp);
}
}
?>
<form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
<p>Url, must be of the form http://example.org<br>
<input type="text" size="100" name="url" value="<?php echo $_POST['url']; ?>"></p>
<p>The user Agent you want to pretend to be(defaults to the user agent of your browser)<br>
<input type="text" size="100" name="user_agent" value="<?php echo $user_agent; ?>"></p>
<p><input type="submit"></p>
</form>
<hr>
<pre>
<?php if (isSet($http_response_header)) print_r($http_response_header); ?>
<hr>
<?php echo htmlentities($html); ?>
</pre>-
RobBroekhuis
- Forum Newbie
- Posts: 5
- Joined: Tue Nov 02, 2004 6:32 am
- Location: Allentown, PA, USA
Rehfeld,
That's a useful little script - I put it in place as a testing playground. The html bit comes through just as it would in a browser (i.e., what I see with "view source"). The header bit is given as:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Wed, 03 Nov 2004 12:47:10 GMT
[2] => Server: Apache/1.3.29 (Unix)
[3] => X-Powered-By: PHP/4.3.8
[4] => Connection: close
[5] => Content-Type: text/html
)
Nothing unexpected, I believe. I didn't try to pose as a different useragent, but I don't think my server tries to cloak anything based on UA or IP - and my script certainly doesn't. I got a suggestion elsewhere to try a "packet sniffer". Never used one of those, but I may look into it.
Thanks for helping me think through this!
Rob
That's a useful little script - I put it in place as a testing playground. The html bit comes through just as it would in a browser (i.e., what I see with "view source"). The header bit is given as:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Wed, 03 Nov 2004 12:47:10 GMT
[2] => Server: Apache/1.3.29 (Unix)
[3] => X-Powered-By: PHP/4.3.8
[4] => Connection: close
[5] => Content-Type: text/html
)
Nothing unexpected, I believe. I didn't try to pose as a different useragent, but I don't think my server tries to cloak anything based on UA or IP - and my script certainly doesn't. I got a suggestion elsewhere to try a "packet sniffer". Never used one of those, but I may look into it.
Thanks for helping me think through this!
Rob