Page 1 of 1

How can I fetch just some text from other sites.?

Posted: Sun Nov 19, 2006 10:30 pm
by Maluendaster
For example :

Welcome to XXXX , here are your download and delete links :
Download Link : http://download
Delete Link : http://delete

And i only want to show in my site the links that are in red...

Posted: Mon Nov 20, 2006 12:10 am
by dibyendrah
This requires a following steps :

1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression

If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.

Dibyendra

Posted: Mon Nov 20, 2006 12:16 am
by John Cartwright
dibyendrah wrote: 1. Open the URL and read it's content
Note this would be done using file_get_contents() or sibling functions.

Posted: Mon Nov 20, 2006 4:00 pm
by Maluendaster
dibyendrah wrote:This requires a following steps :

1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression

If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.

Dibyendra
example, http://www.fastshare.org/ , once you upload a file, i want to get the download and delete paths..

Posted: Tue Nov 21, 2006 1:07 am
by dibyendrah
Taking peice of links

Code: Select all

<?php
$links = "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".<br> <a href=\"http://www.FastShare.org/download/te.t.c\">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>";


preg_match_all('/<a\s+[^>]*href=(\")?([^"]+)(\")?[^>]*>/is', $links, $match);

print_r($match);
?>
Output

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.FastShare.org/download/te.t.c">
            [1] => <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>
        )

    [1] => Array
        (
            [0] => "
            [1] => 
        )

    [2] => Array
        (
            [0] => http://www.FastShare.org/download/te.t.c
            [1] => delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p
        )

    [3] => Array
        (
            [0] => "
            [1] => 
        )

)
In your delete link, put the full url rather than relative path and also put the double quotation in delete link so that we don't have to check for quotation in preg_match.
Another idea is to put somehing marker from where to start reading the text and where to stop reading like :

Code: Select all

<!--start-->

link goes here

<!--stop-->

Hope you have some idea now to start what you going to do.


Cheers,
Dibyendra

Posted: Tue Nov 21, 2006 1:29 am
by dibyendrah
Code sample for the previous approach which I have mentioned before :

Code: Select all

<?php

$text = "<a href=\"http://www.FastShare.org/spenden.php\" target=\"_self\"><font color=green>Spenden</font></a</a>something which I will skip this.... <!--start-->";
$text .= "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".". "<br> <a href=\"http://www.FastShare.org/download/te.t.c\"". ">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=\"http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f". "\">http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f</a><p>";

$text .= " <!--stop--> something which I don\'t need";

$pos_start = strpos($text, "<!--start-->");
$pos_end = strpos($text, "<!--stop-->");

for($i=$pos_start; $i<=$pos_end; $i++){
	$required_text .= $text[$i];
}

preg_match_all('/<a\s+[^>]*href="([^"]+)"?[^>]*>/is', $required_text , $match);

print_r($match);
?>

Output :

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.FastShare.org/download/te.t.c">
            [1] => <a href="http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f">
        )

    [1] => Array
        (
            [0] => http://www.FastShare.org/download/te.t.c
            [1] => http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f
        )

)

Cheers,
Dibyendra