Page 1 of 1
Open html file, download images
Posted: Mon Feb 18, 2013 5:51 am
by JKM
Hi there!
I want to open a html file (
http://xx.se/album123.html), parse through the file, find all links that contains "/pix.php?source=", and wget/download the link after source=:
"/pix.php?source=
http://xx.se/albumfiles/Jfnb83njHfm2kJ.jpg"
- There might be multiple image links in the html file
Re: Open html file, download images
Posted: Mon Feb 18, 2013 6:53 am
by s.dot
Is this legal behavior?
Anyways you would open the file with file_get_contents() [among other ways]
Parse the file for links using a regular expression - preg_match_all().
Loop through the matched links and download the link match file, again using file_get_contents() or another similar way.
What have you tried?
Re: Open html file, download images
Posted: Mon Feb 18, 2013 4:50 pm
by JKM
s.dot wrote:Is this legal behavior?
Anyways you would open the file with file_get_contents() [among other ways]
Parse the file for links using a regular expression - preg_match_all().
Loop through the matched links and download the link match file, again using file_get_contents() or another similar way.
What have you tried?
I see why you think it's illegal behaviour, but it's my images I want to download.
I haven't coded anything for almost two years, and I've always been terrible with RegEx, so I might need some help with that. :p (I just need href="pix.php?source=
X")
Thanks

Re: Open html file, download images
Posted: Tue Feb 19, 2013 5:24 am
by s.dot
Well some pseudo code might go a little bit like this
Code: Select all
<?php
//html file you want to open
$htmlFile = 'http://www.example.com/page.html';
if ($htmlFileContents = file_get_contents($htmlFile))
{
//echo $htmlFileContents; should show the source of the html file
//attempt to match links
preg_match_all('/\?source=(.+?)\"/im', $htmlFileContents, $matches, PREG_SET_ORDER);
if (!empty($matches))
{
//print_r($matches); see what you have here
foreach ($matches AS $match)
{
//I believe $match[1] will have the link...
//use header() to download to client, or grab the file content to write to server
}
}
}
It would be something like that. That is the basic structure for what you want. The regular expression may be wrong and I don't know how you want to save the files.