How can I fetch just some text from other sites.?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Maluendaster
Forum Contributor
Posts: 124
Joined: Fri Feb 25, 2005 1:14 pm

How can I fetch just some text from other sites.?

Post by Maluendaster »

For example :

Welcome to XXXX , here are your download and delete links :
Download Link : http://download
Delete Link : http://delete

And i only want to show in my site the links that are in red...
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

This requires a following steps :

1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression

If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.

Dibyendra
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

dibyendrah wrote: 1. Open the URL and read it's content
Note this would be done using file_get_contents() or sibling functions.
Maluendaster
Forum Contributor
Posts: 124
Joined: Fri Feb 25, 2005 1:14 pm

Post by Maluendaster »

dibyendrah wrote:This requires a following steps :

1. Open the URL and read it's content
2. Scan for the link with font color=red using regular expression

If the page is using style sheet, that will require different regular expression.
Please post the sample page which you want to extract these information.

Dibyendra
example, http://www.fastshare.org/ , once you upload a file, i want to get the download and delete paths..
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Taking peice of links

Code: Select all

<?php
$links = "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".<br> <a href=\"http://www.FastShare.org/download/te.t.c\">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>";


preg_match_all('/<a\s+[^>]*href=(\")?([^"]+)(\")?[^>]*>/is', $links, $match);

print_r($match);
?>
Output

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.FastShare.org/download/te.t.c">
            [1] => <a href=delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p>
        )

    [1] => Array
        (
            [0] => "
            [1] => 
        )

    [2] => Array
        (
            [0] => http://www.FastShare.org/download/te.t.c
            [1] => delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f>http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f</a><p
        )

    [3] => Array
        (
            [0] => "
            [1] => 
        )

)
In your delete link, put the full url rather than relative path and also put the double quotation in delete link so that we don't have to check for quotation in preg_match.
Another idea is to put somehing marker from where to start reading the text and where to stop reading like :

Code: Select all

<!--start-->

link goes here

<!--stop-->

Hope you have some idea now to start what you going to do.


Cheers,
Dibyendra
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Code sample for the previous approach which I have mentioned before :

Code: Select all

<?php

$text = "<a href=\"http://www.FastShare.org/spenden.php\" target=\"_self\"><font color=green>Spenden</font></a</a>something which I will skip this.... <!--start-->";
$text .= "Deine Datei wurde erfolgreich hochgeladen:\"te.t.c\".". "<br> <a href=\"http://www.FastShare.org/download/te.t.c\"". ">http://www.FastShare.org/download/te.t.c</a><br>Löschlink: <a href=\"http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f". "\">http://www.FastShare.org/delete.php?id=te.t.c&". "md5=4f2727be507da0f29556c510d9f6746f</a><p>";

$text .= " <!--stop--> something which I don\'t need";

$pos_start = strpos($text, "<!--start-->");
$pos_end = strpos($text, "<!--stop-->");

for($i=$pos_start; $i<=$pos_end; $i++){
	$required_text .= $text[$i];
}

preg_match_all('/<a\s+[^>]*href="([^"]+)"?[^>]*>/is', $required_text , $match);

print_r($match);
?>

Output :

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.FastShare.org/download/te.t.c">
            [1] => <a href="http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f">
        )

    [1] => Array
        (
            [0] => http://www.FastShare.org/download/te.t.c
            [1] => http://www.FastShare.org/delete.php?id=te.t.c&md5=4f2727be507da0f29556c510d9f6746f
        )

)

Cheers,
Dibyendra
Post Reply