Help with remote site display

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
PHPeter
Forum Newbie
Posts: 7
Joined: Tue Dec 12, 2006 6:21 pm

Help with remote site display

Post by PHPeter »

Hi, I'm new to these forums and to most PHP functionality. I have done some pretty simple things up until now and have gotten a liking of the language.

Recently, I was asked to do something new, which involves RSS and feeds. I've read codes and concepts and I am beginning to understand the use of these things.

Basically I have been asked to extract the contents of a table found in a remote page and display those contents on a table on the local page. The setup is the following: there is a table at the right with news, and a link that says "Read whole article", so when you click on the link you are supposed to get the whole article on the same page but in the center table (which normally only has a "welcome" text).

I tried the following:

Code: Select all

//the following code is inside a for loop
$cont = '<div class="feedstory"><h3>'.$rs["items"][$i]["title"]."</h3> \n"; //$rs is an array, [items] is an array and each index is an array containing the title, the publish date, the overview of the article and the link to the whole article
    $cont .= '<p align="justify"><font style="color:#ccc; font-size: 9px;">'.$rs["items"][$i]['pubDate']."</font></p>\n";
    $cont .= "<p align=\"justify\">".html_entity_decode($rs["items"][$i]["description"])."</p>\n";
    $cont .= "<p><a target=\"_self\" href=\"".$rs["items"][$i]["link"]."\">"._READ_WHOLE."</a></p>\n";
$cont .= "<div style=\"clear:both;\"></div>\n</div>\n\n";
this piece of code is used to get the overview of the articles, where $rs gives you the title, the overview and finally the link to the whole article. However, right now as it is, this redirects you to the remote page where the whole article is, which also contains other info. I want to extract the article (found in a table) and display it in my page. What I tried to do is exactly this same thing:

Code: Select all

$rs2 = $rss->get($rs["items"][$i]["link"] //$i is the index of the article selected, so I'm trying to get to the link where the whole article resides
		$cont .= '<div class="feedstory"><h3>'.$rs2["items"][0]["title"]."</h3> \n";
    	$cont .= '<p align="justify"><font style="color:#ccc; font-size: 9px;">'.$rs2["items"][0]['pubDate']."</font></p>\n";
    	$cont .= "<p align=\"justify\">".html_entity_decode($rs2["items"][0]["description"])."</p>\n";
    	$cont .= "<div style=\"clear:both;\"></div>\n</div>\n\n";
where $rs2 is supposed to get the article and display the info. The method $rs->get is supposed to extract the contents of the page selected.

However, this does not work (nothing is displayed), and since I don't have much experience with RSS and feeds I'm quite lost now as to what to look for.

Any suggestions or comments? Thanks in advance.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

If you are scanning a remote page, you will more than likely need to tap into the PHP filesystem functions and a heap of regular expressions.
PHPeter
Forum Newbie
Posts: 7
Joined: Tue Dec 12, 2006 6:21 pm

Re:

Post by PHPeter »

Does that mean converting the page to a regular expression and scanning it? How is that done? In this case, the remote page also contains three tables, and the point is to extract the contents of one of them; how do you recognize which is the correct one?
Sloth
Forum Newbie
Posts: 18
Joined: Thu Dec 07, 2006 7:29 pm

Post by Sloth »

scraping is evil. :evil: :evil:

But to do what you want the table has to have some identifying element, maybe a unique ID or class indicator (Or even a unique <tr> <td> combo, w/e goes)
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Basically you need to grab the source of the page you want to scan, break it down into lines, look for the line that contains the opening table tag of the table you want, find the line of the closing tag of the that table, then regex all the stuff in between to get at what you want.
PHPeter
Forum Newbie
Posts: 7
Joined: Tue Dec 12, 2006 6:21 pm

Post by PHPeter »

I see. I'll do that, and if I find anything else I'll tell you. Thank you all for your help!
Post Reply