Php link exporter

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Php link exporter

Post by The_L »

This is my code:

taggrab.class.php

Code: Select all

<?php
 
class tagSpider
{
 
var $crl; // this will hold our curl instance
var $html; // this is where we dump the html we get
var $binary; // set for binary type transfer
var $url; // this is the url we are going to do a pass on
 
 
 
function tagSpider()
{
    $this->html = "";
    $this->binary = 0;
    $this->url = "";
}
 
 
function fetchPage($url)
{
 
 
    $this->url = $url;
    if (isset($this->url)) {
 
                $this->ch = curl_init (); // start cURL instance
                curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1); // this tells cUrl to return the data
                curl_setopt ($this->ch, CURLOPT_URL, $this->url); // set the url to download
                curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
                curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); // tell cURL if the data is binary data or not
                $this->html = curl_exec($this->ch); // grabs the webpage from the internet
                curl_close ($this->ch); // closes the connection
                }
}
 
 
function parse_array($beg_tag, $close_tag) // this function takes the grabbed html and picked out the pieces we want
{
    preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); // match data between specificed tags
    return $matching_data[0];
}
 
 
}
?>
tag-example.php

Code: Select all

<?php
 
// Inlcude our tag grab class
require("taggrab.class.php"); // class for spider
 
// Enter the URL you want to run
$urlrun="some url";
 
// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag="</a>";
 
// Make a title spider
$tspider = new tagSpider();
 
// Pass URL to the fetch page function
$tspider->fetchPage($urlrun);
 
// Enter the tags into the parse array function
$linkarray = $tspider->parse_array($stag, $etag); 
 
echo "<h2>Links present on page: ".$urlrun."</h2><br />";
// Loop to pump out the results
foreach ($linkarray as $result) {
 
echo $result;
 
echo "<br/>";
}
 
?>
The script works just fine but i need an input field that will define $urlrun var in second file...i tried almost everything...but all i get is errors...can someone help me with this??

Thanks.
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Can anyone take a look?? :/
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Php link exporter

Post by Apollo »

Exactly what kind of error message do you get?

At first glance, the only problem I see is missing begin and end delimiter chars (to separate expression from modifiers) in your regular expression:
The_L wrote:preg_match_all("($beg_tag.*$close_tag)siU",
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Both files works fine i just want to insert an input field and button for

Code: Select all

// Enter the URL you want to run
$urlrun="some url";
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Php link exporter

Post by Apollo »

Then what's the problem? Just a simple form would do I guess?

Code: Select all

$urlrun = $_POST['urlrun'];
if (!$urlrun) die("<form method='post'><input type='text' name='urlrun'> <input type='submit'></form>");
User avatar
timWebUK
Forum Contributor
Posts: 239
Joined: Thu Oct 29, 2009 6:48 am
Location: UK

Re: Php link exporter

Post by timWebUK »

Can you not create an HTML form then POST the URL, and process it using your PHP file? Or have I missed something...
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Great,its just perfect THANKS...instead of opening new topic ill ask here again...

this part:

Code: Select all

 
$stag="<a href=";
$etag="</a>";
 
how should i make it list only urls that begins with http://youtube.com/ and http://google.com/ (for example) i tried:

Code: Select all

 
$stag="<a href=http://google.com/";
$etag="</a>";
 
But it wont list anything...

And when i try this:

Code: Select all

$stag="<a href=";
$etag="" class="bbc_link new_win" target="_blank">";
Then i get this error:

Code: Select all

Parse error: syntax error, unexpected T_CLASS in *host path*/test/tag-example.php on line 12
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Php link exporter

Post by iankent »

The_L wrote: how should i make it list only urls that begins with http://youtube.com/ and http://google.com/ (for example) i tried:
even if your regex is correct (which I can't guarantee as I'm no regex expert!), you'll probably find most google/youtube links won't start http://google.com/ etc but will instead be http://www.google.com/ (or google.com.au, google.co.uk etc). You may want to match http://*.google.* instead (no idea what that is as a regex sorry - really must learn!)

edit:
actually, if you want to match google links and be sure that its definately from google, you'd need to match the TLD part against a list of valid ones, or better still against a list of google owned ones. Just matching http://*.google.* would also match http://something.google.anothersite.com/, which you may want to exclude
The_L wrote: And when i try this:

Code: Select all

$stag="<a href=";
$etag="" class="bbc_link new_win" target="_blank">";
Then i get this error:

Code: Select all

Parse error: syntax error, unexpected T_CLASS in *host path*/test/tag-example.php on line 12
You can't put a " inside "" without escaping it. I.e., on the line:
code]$etag="" class="bbc_link new_win" target="_blank">";
you're opening the double quotes then closing them., so class=etc is being treated as PHP. It should be this:

Code: Select all

$etag="\" class=\"bbc_link new_win\" target=\"_blank\">";
alternatively you could enclose it with single quotes which would allow the double quotes to be included as normal

hth
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

even if your regex is correct (which I can't guarantee as I'm no regex expert!), you'll probably find most google/youtube links won't start http://google.com/ etc but will instead be http://www.google.com/ (or google.com.au, google.co.uk etc). You may want to match http://*.google.* instead (no idea what that is as a regex sorry - really must learn!)
Shouldn't it be just easier to insert all variations of google site? Like:
google.com/
http://google.com/
http://www.google.com/
http://www.google.com/

the problem is that i don't know how to put "OR" command xD

As for the second problem...it works. Just to make it clear before every -"- (witch is not part of code) i should put -\- ???
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Php link exporter

Post by iankent »

The_L wrote:Shouldn't it be just easier to insert all variations of google site? Like:
google.com/
http://google.com/
http://www.google.com/
http://www.google.com/

the problem is that i don't know how to put "OR" command xD
when you say an OR command, what do you mean? If you want to match against a list of possible items you can use an array, for example:

Code: Select all

 
$possibilities = array('http://google.com/', 'http://www.google.com/', 'http://www.google.co.uk/', 'http://google.co.uk/');
foreach($possibilities as $possibility) {
    // run your existing regexp here
}
 
But, that's a bit of a messy solution and almost guaranteed you won't account for every google URL available. What if you come across the url http://images.google.com/, should that match or not? You can do it either way but a decent regex will be a lot more accurate and a lot faster, and means you don't have to manually type out every possible google URL variation you can think of. Its just a matter of learning regex well enough or finding somebody willing to help. Personally I don't have a clue lol.
The_L wrote: As for the second problem...it works. Just to make it clear before every -"- (witch is not part of code) i should put -\- ???
correct - if you're putting a value in a string using double quotes, e.g. "blah", any 'special characters' inside that need to be escaped with a backslash. So \n is newline, \r is carriage-return, \t is tab, \\ is a backslash, \" is ". There are others but I can't remember them :p
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Hehe you are really clearing up php to me...

But, that's a bit of a messy solution and almost guaranteed you won't account for every google URL available. What if you come across the url http://images.google.com/, should that match or not? You can do it either way but a decent regex will be a lot more accurate and a lot faster, and means you don't have to manually type out every possible google URL variation you can think of. Its just a matter of learning regex well enough or finding somebody willing to help. Personally I don't have a clue lol.
I guess you got it wrong when i said google.com in my first post i told like example...so its not rly have to be google...it should be ordinary site so im guessing that "http://www..." "www..." "http://..." and "justurl.com" combinations are just fine...
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Php link exporter

Post by iankent »

The_L wrote:I guess you got it wrong when i said google.com in my first post i told like example...so its not rly have to be google...it should be ordinary site so im guessing that "http://www..." "www..." "http://..." and "justurl.com" combinations are just fine...
Ah I see, you don't want to match google/youtube, you want to match any URL you come across as long as its a URL?

Here's a good tip for you (but be careful around licensing etc if you're going to sell/redistribute you're code) - have a look in the status.net source code (http://status.net/ - it's like twitter), and there's a handy function that uses a single regex to match almost all recognised URLs. You could also have a look in the phpbb source code which I'm sure will contain similar useful regexes!
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Hehe,im not gonna sell anything...to make you all clear i wanna export forum post links...so when someone posts lot of links i wanna get them without any text...simply just links..so when someone posts tones of youtube links i just wanna copy them...so i don't think this should be so confusing...it should be simple..;)
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Php link exporter

Post by iankent »

The_L wrote:Hehe,im not gonna sell anything...to make you all clear i wanna export forum post links...so when someone posts lot of links i wanna get them without any text...simply just links..so when someone posts tones of youtube links i just wanna copy them...so i don't think this should be so confusing...it should be simple..;)
If that's all you want to do then the status.net regex should do exactly what you need. Just looked it up, this regex should find all the matches you need :)

Code: Select all

 
$regex = '#'.
    '(?:^|[\s\(\)\[\]\{\}\\\'\\\";]+)(?![\@\!\#])'.
    '('.
        '(?:'.
            '(?:'. //Known protocols
                '(?:'.
                    '(?:(?:https?|ftps?|mms|rtsp|gopher|news|nntp|telnet|wais|file|prospero|webcal|irc)://)'.
                    '|'.
                    '(?:(?:mailto|aim|tel|xmpp):)'.
                ')'.
                '(?:[\pN\pL\-\_\+\%\~]+(?::[\pN\pL\-\_\+\%\~]+)?\@)?'. //user:pass@
                '(?:'.
                    '(?:'.
                        '\[[\pN\pL\-\_\:\.]+(?<![\.\:])\]'. //[dns]
                    ')|(?:'.
                        '[\pN\pL\-\_\:\.]+(?<![\.\:])'. //dns
                    ')'.
                ')'.
            ')'.
            '|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'. //IPv4
            '|(?:'. //IPv6
                '\[?(?:(?:(?:[0-9A-Fa-f]{1,4}:){7}(?:(?:[0-9A-Fa-f]{1,4})|:))|(?:(?:[0-9A-Fa-f]{1,4}:){6}(?::|(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})|(?::[0-9A-Fa-f]{1,4})))|(?:(?:[0-9A-Fa-f]{1,4}:){5}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?:(?:[0-9A-Fa-f]{1,4}:){4}(?::[0-9A-Fa-f]{1,4}){0,1}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?:(?:[0-9A-Fa-f]{1,4}:){3}(?::[0-9A-Fa-f]{1,4}){0,2}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?:(?:[0-9A-Fa-f]{1,4}:){2}(?::[0-9A-Fa-f]{1,4}){0,3}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?:(?:[0-9A-Fa-f]{1,4}:)(?::[0-9A-Fa-f]{1,4}){0,4}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?::(?::[0-9A-Fa-f]{1,4}){0,5}(?:(?::(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|(?:(?::[0-9A-Fa-f]{1,4}){1,2})))|(?:(?:(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})))\]?(?<!:)'.
            ')|(?:'. //DNS
                '(?:[\pN\pL\-\_\+\%\~]+(?:\:[\pN\pL\-\_\+\%\~]+)?\@)?'. //user:pass@
                '[\pN\pL\-\_]+(?:\.[\pN\pL\-\_]+)*\.'.
                //tld list from http://data.iana.org/TLD/tlds-alpha-by-domain.txt, also added local, loc, and onion
                '(?:AC|AD|AE|AERO|AF|AG|AI|AL|AM|AN|AO|AQ|AR|ARPA|AS|ASIA|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BIZ|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CAT|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|COM|COOP|CR|CU|CV|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EDU|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GOV|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|INFO|INT|IO|IQ|IR|IS|IT|JE|JM|JO|JOBS|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MIL|MK|ML|MM|MN|MO|MOBI|MP|MQ|MR|MS|MT|MU|MUSEUM|MV|MW|MX|MY|MZ|NA|NAME|NC|NE|NET|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|ORG|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PRO|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SY|SZ|TC|TD|TEL|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TRAVEL|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|XN--0ZWM56D|??|XN--11B5BS3A9AJ6G|???????|XN--80AKHBYKNJ4F|?????????|XN--9T4B11YI5A|???|XN--DEBA0AD|????|XN--G6W251D|??|XN--HGBK6AJ7F53BBA|???????|XN--HLCJ6AYA9ESC7A|???????|XN--JXALPDLP|??????|XN--KGBECHTV|??????|XN--ZCKZAH|???|YE|YT|YU|ZA|ZM|ZW|local|loc|onion)'.
            ')(?![\pN\pL\-\_])'.
        ')'.
        '(?:'.
            '(?:\:\d+)?'. //:port
            '(?:/[\pN\pL$\[\]\,\!\(\)\.\:\-\_\+\/\=\&\;\%\~\*\$\+\'\"@]*)?'. // /path
            '(?:\?[\pN\pL\$\[\]\,\!\(\)\.\:\-\_\+\/\=\&\;\%\~\*\$\+\'\"@\/]*)?'. // ?query string
            '(?:\#[\pN\pL$\[\]\,\!\(\)\.\:\-\_\+\/\=\&\;\%\~\*\$\+\'\"\@/\?\#]*)?'. // #fragment
        ')(?<![\?\.\,\#\,])'.
    ')'.
    '#ixu';
 
courtesy of status.net :)
The_L
Forum Commoner
Posts: 64
Joined: Sun Nov 22, 2009 6:53 pm

Re: Php link exporter

Post by The_L »

Hmm...as i tried it doesnot work to me...

Can you just please point me how to add this:

Code: Select all

$possibilities = array('http://google.com/', 'http://www.google.com/', 'http://www.google.co.uk/', 'http://google.co.uk/');
foreach($possibilities as $possibility) {
    // run your existing regexp here
}
into

Code: Select all

// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag=".html\" class=\"bbc_link new_win\" target=\"_blank\">";
:/
Post Reply