Fetching content

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jabbaonthedais
Forum Contributor
Posts: 127
Joined: Wed Aug 18, 2004 12:08 pm

Fetching content

Post by jabbaonthedais »

I need to fetch some content from another url. I don't want their whole site, just small portions, and link those portions to their site. Or maybe grab the links from their site, and link to the same places on mine. They have already consented to this, but I just need the code to do it.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

[php_man]file_get_contents[/php_man] can retrieve the page.. [php_man]preg_match[/php_man] and [php_man]preg_match_all[/php_man] can be used to extract the information.
jabbaonthedais
Forum Contributor
Posts: 127
Joined: Wed Aug 18, 2004 12:08 pm

Post by jabbaonthedais »

I've been trying tons of functions, but can't get any to work with my particular need. I'm trying to grab links from a site. They are not seperated by lines. I just need the url and description for each link saved in an array, so I can call the variables like this:

Code: Select all

<a href="<?php print $url1; ?>" style="text-decoration:none" onMouseOver="window.status='<?php print $desc1; ?>';return true" onMouseOut="window.status='  '">Gallery 01</a> - <a href="<?php print $url6; ?>" style="text-decoration:none" onMouseOver="window.status='<?php print $desc6; ?>';return true" OnMouseOut="window.status='  '">Gallery 06</a><br>

<a href="<?php print $url2; ?>" style="text-decoration:none" OnMouseOver="window.status='<?php print $desc2; ?>';return true" onMouseOut="window.status='  '">Gallery 07</a> - <a href="<?php print $url7; ?>" style="text-decoration:none" onMouseOver="window.status='<?php print $desc7; ?>';return true" onMouseOut="window.status='  '">Gallery 06</a><br>
And all the way down to 5 and 10.

So basically I just need the first 10 links in a section of the page.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

the linked pages I posted can do these..
User avatar
m3mn0n
PHP Evangelist
Posts: 3548
Joined: Tue Aug 13, 2002 3:35 pm
Location: Calgary, Canada

Post by m3mn0n »

What I do is [php_man]file_get_contents[/php_man]() the HTML, and then [php_man]explode[/php_man]() the part I want out of the HTML and then parse it to remove tags, and such.

It's worked wonders for a ton of sites. Even if they change much of the layout, and have dynamically generated content, it will still work. Same goes for [php_man]regex[/php_man] matching, as feyd mentioned.
jabbaonthedais
Forum Contributor
Posts: 127
Joined: Wed Aug 18, 2004 12:08 pm

Post by jabbaonthedais »

Someone on php.net suggested you should use preg_split rather than explode to split a string containing multiple seperators. The links I'm trying to get are in tables with <table>, <td>, <tr>, etc. that I don't need. But I don't need I seperator I don't think, because of all the extra code. I need the function to just extract the links from the page, including the text of the link, and number each one so I can call the first 10 on my page. Is this still the right direction?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

the regular expression (preg_*) functions are still the direction to go.
Post Reply