Collecting Info Off Another Website

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
leewad
Forum Commoner
Posts: 91
Joined: Tue May 11, 2004 8:32 am

Collecting Info Off Another Website

Post by leewad »

Hi

I have been given permission to use a table located on another website and wanting the data extracted and inserted in my database, the table is as follows:

Code: Select all

<table>
<tr><th>Indicative Rates</th>
<th>Sell</th>
<th>Buy</th>
</tr>
<tr>
<td>GBPEUR</td>
<td>1.2316</td>
<td>1.2336</td>
</tr>
<tr>
<td>GBPUSD</td>
<td>1.6109</td>
<td>1.6129</td>
</tr>
<tr>
<td>EURUSD</td>
<td>1.3067</td>
<td>1.3087</td>
</tr>
<tr>
<td>GBPJPY</td>
<td>131.9240</td>
<td>132.3240</td>
</tr>
<tr>
<td>GBPAUD</td>
<td>1.5381</td>
<td>1.5401</td>
</tr>
<tr>
<td>GBPNZD</td>
<td>1.9539</td>
<td>1.9559</td>
</tr>
<tr>
<td>GBPCAD</td>
<td>1.6009</td>
<td>1.6029</td>
</tr>
<tr>
<td>NZDUSD</td>
<td>0.8238</td>
<td>0.8252</td>
</tr>
<tr>
<td>GBPZAR</td>
<td>14.2347</td>
<td>14.2747</td>
</tr>
<tr>
<td>USDZAR</td>
<td>8.8336</td>
<td>8.8536</td>
</tr>
<tr>
<td>GBPPLN</td>
<td>5.0818</td>
<td>5.1018</td>
</tr>
<tr>
<td>EURJPY</td>
<td>106.9922</td>
<td>107.3922</td>
</tr>
<tr>
<td colspan=3><nobr>Rates up-to-date at 4th December 2012 10:51:20am</nobr></td>
</tr>
</table>
which method would be best to extract the data ?
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: Collecting Info Off Another Website

Post by social_experiment »

http://php.net/manual/en/function.file-get-contents.php
You could look at reading the page into a string and then search for the specific data inside the string
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: Collecting Info Off Another Website

Post by Eric! »

The buzz word is scraping. And there are a wide variety of ways to accomplish this in PHP. You can just search through the text with regex patterns as social_experiment suggests. However I've found the most robust is to use the DOM (document object model) and XPath.

The DOM and XPath can take a little time to learn, but there are tools that can help. For example if you use FireFox and the ad-on FireBug, you can instantly view the DOM of any webpage and extract the XPath for items you want to locate in it with a click of a button. It's a good way for beginners to learn this technique.

Here's a simple little example I found on the web for digg.com to get you started

Code: Select all

 
<?php
    //a URL you want to retrieve
    $my_url = 'http://www.digg.com';
    $html = file_get_contents($my_url);
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
     
    //Put your XPath Query here
    $my_xpath_query = "/html/body/div[@id='container']/div[@id='contents']/div[@class='list' and @id='wrapper']/div[@class='main' and position()=1]/div[contains(@class, 'news-summary')]/div[@class='news-body']/h3";
    $result_rows = $xpath->query($my_xpath_query);
     
    //here we loop through our results (a DOMDocument Object)
    foreach ($result_rows as $result_object){
         echo $result_object->childNodes->item(0)->nodeValue;
    }
?>
And here's an example using Firefox w/ Firebug to extract the Xpath query for a table. http://edoism.orcutt.org/2010/08/gettin ... -with.html. Note that when working with tables the TBODY element should be removed from your XPath query as was shown in this example.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: Collecting Info Off Another Website

Post by greyhoundcode »

If that table markup is going to be typical, you could also consider using SimpleXML.
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: Collecting Info Off Another Website

Post by social_experiment »

Eric! wrote:text with regex patterns
I'm not sure if i used regex incorrectly the last time but it was a bit of a pain finding specific data within the html code; it's great if you know specific id values to find lets say table id="mainTable" but without them i found it difficult to scrape the data i wanted.
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Collecting Info Off Another Website

Post by Christopher »

If the file is as structured as the above example you could do:

Code: Select all

if (substr($line, 0, 4) == '<td>') {
     $data = substr($line, 4, strlen($line)-9);
}
I think I would try a HTML/XML parser first and use PREG if they were too much trouble.
(#10850)
Post Reply