PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Tue Sep 25, 2018 1:02 pm

All times are UTC - 5 hours




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Fri Dec 11, 2009 1:53 pm 
Offline
Forum Commoner

Joined: Wed May 28, 2008 1:51 pm
Posts: 46
Location: Kolkata, India
I have an url http://url.com/some.htm
I want to log all the links in this html file to a text file.

Only links.

How to do that?


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 3:03 pm 
Offline
DevNet Master
User avatar

Joined: Mon Feb 24, 2003 11:12 am
Posts: 2572
Location: The Republic of Texas
Read the file, then use a regex to capture the links, then write out to your text file. Alternately, you could load it up with the DOM extension and loop through the anchors and extract the URLs and write out to a text file.

http://us2.php.net/manual/en/book.dom.php

_________________
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 3:08 pm 
Offline
Forum Contributor
User avatar

Joined: Thu May 11, 2006 8:58 pm
Posts: 305
Location: Utah, USA
Here is a quick and dirty way that I just tested:

Syntax: [ Download ] [ Hide ]
<?php
$html = file_get_contents('http://forums.devnetwork.net/viewtopic.php?f=39&t=110110');
preg_match_all('/<a[^>]+href[^>]*=[^>]*(\'|")([^>]*)\1/si', $html, $matches, PREG_SET_ORDER);
$links = '';
foreach ($matches as $match) {
    $links .= $match[2] . "\n";
}
echo "<pre>" . $links;


echoes this string:
Syntax: [ Download ] [ Hide ]
./index.php?sid=155659f42576cef48964d51018cf8222
./ucp.php?mode=login&sid=155659f42576cef48964d51018cf8222
./ucp.php?mode=register&sid=155659f42576cef48964d51018cf8222
<snip>
./viewtopic.php?p=582433&sid=155659f42576cef48964d51018cf8222#p582433
http://url.com/some.htm
#wrapheader
<snip>
./viewforum.php?f=59&sid=155659f42576cef48964d51018cf8222
./viewforum.php?f=39&sid=155659f42576cef48964d51018cf8222
http://www.phpbb.com/


You'd probably want to add some code that runs parse_url() on the original url and resolves relative and absolute links. You may also want to discard those that are simply hashes like #wrapheader.


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 3:23 pm 
Offline
DevNet Master
User avatar

Joined: Mon Feb 24, 2003 11:12 am
Posts: 2572
Location: The Republic of Texas
I thought they could do their own research, but since we're posting code:

Syntax: [ Download ] [ Hide ]
$html = new DOMDocument();
$html->loadHTMLFile('http://url.com/some.htm');
$tags = $html->getElementsByTagName('a');
 
$links = '';
foreach ($tags as $tag) {
    $links .= $tag->getAttribute('href') . "\n";
}
file_put_contents('somefile.txt', $links);

_________________
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 4:00 pm 
Offline
Forum Commoner

Joined: Wed May 28, 2008 1:51 pm
Posts: 46
Location: Kolkata, India
Hi Shawn,

Working with url viewtopic.php?f=39&t=110110
but i need it to work with https://mobile.bet365.com/wap?task=upda ... w!&login=F

when I write

$html->loadHTMLFile('https://mobile.bet365.com/wap?task=update&id=2%3a17&outcome=&title=In+Play+Now!&login=F');

its showing these errors:

Notice: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Unable to find the wrapper "https" - did you forget to enable it when you configured PHP? in C:\wamp\www\paisa\web extractor.php on line 3

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: I/O warning : failed to load external entity "https://mobile.bet365.com/wap?task=update&id=2%3a17&outcome=&title=In+Play+Now!&login=F" in C:\wamp\www\paisa\web extractor.php on line 3


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 4:40 pm 
Offline
DevNet Master
User avatar

Joined: Mon Feb 24, 2003 11:12 am
Posts: 2572
Location: The Republic of Texas
You probably need to enable the extension=php_openssl.??? in php.ini.

_________________
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 4:57 pm 
Offline
Forum Commoner

Joined: Wed May 28, 2008 1:51 pm
Posts: 46
Location: Kolkata, India
But why it is working with some url and some not?


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 5:47 pm 
Offline
DevNet Master
User avatar

Joined: Mon Feb 24, 2003 11:12 am
Posts: 2572
Location: The Republic of Texas
Because some are https and some are http.

_________________
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 5:52 pm 
Offline
Forum Commoner

Joined: Wed May 28, 2008 1:51 pm
Posts: 46
Location: Kolkata, India
that means for https I have to install openssl?

Can anyone help me installing openssl on wamp 2.0


Top
 Profile  
 
PostPosted: Fri Dec 11, 2009 6:35 pm 
Offline
DevNet Master
User avatar

Joined: Mon Feb 24, 2003 11:12 am
Posts: 2572
Location: The Republic of Texas
kapil1089theking wrote:
that means for https I have to install openssl?

Can anyone help me installing openssl on wamp 2.0

I think all you have to do is uncomment extension=php_openssl.dll in php.ini and restart apache. Make sure that your PHP directory is in your path or copy the openssleay.dll to \windows\system.

It's been years since I used windows.

_________________
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group