file_get_contents() gives slightly altered file contents
Posted: Fri Dec 26, 2008 12:11 am
I have a simple echo of a website's source:
A small portion of the output is a little fishy:
When viewing the page source of the actual page in a browser, that same area is:

The file_get_contents() actually returns a slightly different contents than the actual page! The function seems to have added "?token=randomString" to all of the page traversal URL's. I'm working on a web crawler and these weird links are screwing up the crawling.
Code: Select all
echo file_get_contents('http://www.threadless.com/blogs/blogs');
Code: Select all
<a class="pagea selected" href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,1">1</a>
Code: Select all
<a class="pagea selected" href="/blogs/blogs/page,1">1</a>

The file_get_contents() actually returns a slightly different contents than the actual page! The function seems to have added "?token=randomString" to all of the page traversal URL's. I'm working on a web crawler and these weird links are screwing up the crawling.