Page 1 of 1

get remote images from websites

Posted: Tue Sep 25, 2007 10:50 pm
by GeXus
I'm trying to write a script where basically someone with a picture gallery can enter their url and it will grab all the images off of their gallery...

It uses regex to get the image urls...

The issue is that in order to do a file_get_contents on the image, I need to know the full URL...

Some are relative some are absolute, some use '../' etc... How would you suggest the best way would be to handle this?

I know the original URL... so if for example it is ../, it's easy I can just strip the last dash on the url and concatinate it to the image, if its the full url.. again easy.. but when its relative like /image.jpg, it gets kind of tricky...

Anyone have suggestions for the best way to handle all possible types?

Posted: Tue Sep 25, 2007 11:10 pm
by EricS
This is psuedo code representation of what would work.

First explode the url of the page that your getting your images from by '/'. This will give you the domain first and the directory structure in an array. We'll call this $pageElements.

You need to check the end of the array for a file name. You aren't gonna want the file name in this array. This should just be the domain name and each directory as separate elements leading up to the filename but not including it.

Now for relative links such as '../../images/filename.jpg'. Anything that starts with '.'.

Split this array by '/'. Well call this $imageElements. Count the number of times '..' shows at the beginning of the array. Drop those elements and drop the corresponding number of elements off the end of $pageElements. Find any instances of '.' alone and remove them from $imageElements. They are not needed. Now merge $pageElements with $imageElements and then implode the merged array with '/'. You now have the url to the image you want.

Root relative links, ones that start with '/' are easy. Append the domain name $pageElement[0] in front of this link and it will resolve.

Finally, absolute links, ones that start with 'http' can obviously be used as is.

Hope this helps.