Hi there!
I want to open a html file (http://xx.se/album123.html), parse through the file, find all links that contains "/pix.php?source=", and wget/download the link after source=:
"/pix.php?source=http://xx.se/albumfiles/Jfnb83njHfm2kJ.jpg"
- There might be multiple image links in the html file
Open html file, download images
Moderator: General Moderators
Re: Open html file, download images
Is this legal behavior?
Anyways you would open the file with file_get_contents() [among other ways]
Parse the file for links using a regular expression - preg_match_all().
Loop through the matched links and download the link match file, again using file_get_contents() or another similar way.
What have you tried?
Anyways you would open the file with file_get_contents() [among other ways]
Parse the file for links using a regular expression - preg_match_all().
Loop through the matched links and download the link match file, again using file_get_contents() or another similar way.
What have you tried?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Re: Open html file, download images
I see why you think it's illegal behaviour, but it's my images I want to download.s.dot wrote:Is this legal behavior?
Anyways you would open the file with file_get_contents() [among other ways]
Parse the file for links using a regular expression - preg_match_all().
Loop through the matched links and download the link match file, again using file_get_contents() or another similar way.
What have you tried?
I haven't coded anything for almost two years, and I've always been terrible with RegEx, so I might need some help with that. :p (I just need href="pix.php?source=X")
Thanks
Last edited by JKM on Tue Feb 19, 2013 5:40 am, edited 1 time in total.
Re: Open html file, download images
Well some pseudo code might go a little bit like this
It would be something like that. That is the basic structure for what you want. The regular expression may be wrong and I don't know how you want to save the files.
Code: Select all
<?php
//html file you want to open
$htmlFile = 'http://www.example.com/page.html';
if ($htmlFileContents = file_get_contents($htmlFile))
{
//echo $htmlFileContents; should show the source of the html file
//attempt to match links
preg_match_all('/\?source=(.+?)\"/im', $htmlFileContents, $matches, PREG_SET_ORDER);
if (!empty($matches))
{
//print_r($matches); see what you have here
foreach ($matches AS $match)
{
//I believe $match[1] will have the link...
//use header() to download to client, or grab the file content to write to server
}
}
}Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.