Saving a cached copy of a link

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
mrhoopz
Forum Newbie
Posts: 11
Joined: Tue Feb 06, 2007 1:35 pm

Saving a cached copy of a link

Post by mrhoopz »

Ok, I have a website where users post some summary information about an article and a link to the actual article. These can then be searched by anyone. Some of the links invariably become dead so I would like to save a copy of them on the server so that if the link is dead a user can click on something like 'View Archived Content' and they can see the original article.

PDF's are no problem. HTML is. I tried using PHP to download the HTML file that the link points to, but then you don't get images. What I'd like to do is something similar to what you do when you save a web page and all the images with it, but I obviously need to do it in PHP.

Any suggestions would be greatly appreciated, thanks!
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

you could use file_get_contents() to get the HTML layout then parse the string and look for images. You can then use file_get_contents() again for the images and save them to your server.

keep in mind, if you're getting this stuff from an external site, you need to make sure you have permission to save the data (images) to your site so as to avoid any copyright violations.
mrhoopz
Forum Newbie
Posts: 11
Joined: Tue Feb 06, 2007 1:35 pm

Post by mrhoopz »

I'm aware of the copyright issues here, and viewing the cached content will only be available to users on my local intranet, and not to anyone outside of it.

Thanks for the tip, though, it looks like it should work, although I'm not sure of the best way to parse the string to look for images.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

use a regular expression and look for the <img> tag and pull out the content from the src attribute.
Post Reply