Page 1 of 1
How can I fetch just some text from other sites.?
Posted: Sun Nov 19, 2006 10:30 pm
by Maluendaster
For example :
Welcome to XXXX , here are your download and delete links :
Download Link :
http://download
Delete Link :
http://delete
And i only want to show in my site the links that are in
red...
Posted: Mon Nov 20, 2006 12:10 am
by dibyendrah
This requires a following steps :
1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression
If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.
Dibyendra
Posted: Mon Nov 20, 2006 12:16 am
by John Cartwright
dibyendrah wrote:
1. Open the URL and read it's content
Note this would be done using file_get_contents() or sibling functions.
Posted: Mon Nov 20, 2006 4:00 pm
by Maluendaster
dibyendrah wrote:This requires a following steps :
1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression
If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.
Dibyendra
example,
http://www.fastshare.org/ , once you upload a file, i want to get the download and delete paths..
Posted: Tue Nov 21, 2006 1:07 am
by dibyendrah
Taking peice of links
Code: Select all
<?php
$links = "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".<br> <a href=\"http://www.FastShare.org/download/te.t.c\">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>";
preg_match_all('/<a\s+[^>]*href=(\")?([^"]+)(\")?[^>]*>/is', $links, $match);
print_r($match);
?>
Output
Code: Select all
Array
(
[0] => Array
(
[0] => <a href="http://www.FastShare.org/download/te.t.c">
[1] => <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>
)
[1] => Array
(
[0] => "
[1] =>
)
[2] => Array
(
[0] => http://www.FastShare.org/download/te.t.c
[1] => delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p
)
[3] => Array
(
[0] => "
[1] =>
)
)
In your delete link, put the full url rather than relative path and also put the double quotation in delete link so that we don't have to check for quotation in preg_match.
Another idea is to put somehing marker from where to start reading the text and where to stop reading like :
Code: Select all
<!--start-->
link goes here
<!--stop-->
Hope you have some idea now to start what you going to do.
Cheers,
Dibyendra
Posted: Tue Nov 21, 2006 1:29 am
by dibyendrah
Code sample for the previous approach which I have mentioned before :
Code: Select all
<?php
$text = "<a href=\"http://www.FastShare.org/spenden.php\" target=\"_self\"><font color=green>Spenden</font></a</a>something which I will skip this.... <!--start-->";
$text .= "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".". "<br> <a href=\"http://www.FastShare.org/download/te.t.c\"". ">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=\"http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f". "\">http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f</a><p>";
$text .= " <!--stop--> something which I don\'t need";
$pos_start = strpos($text, "<!--start-->");
$pos_end = strpos($text, "<!--stop-->");
for($i=$pos_start; $i<=$pos_end; $i++){
$required_text .= $text[$i];
}
preg_match_all('/<a\s+[^>]*href="([^"]+)"?[^>]*>/is', $required_text , $match);
print_r($match);
?>
Output :
Code: Select all
Array
(
[0] => Array
(
[0] => <a href="http://www.FastShare.org/download/te.t.c">
[1] => <a href="http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f">
)
[1] => Array
(
[0] => http://www.FastShare.org/download/te.t.c
[1] => http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f
)
)
Cheers,
Dibyendra