Grabbing the title from Yahoo.com; seems easy but is it?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
numbers369
Forum Newbie
Posts: 1
Joined: Fri Oct 19, 2007 9:16 pm

Grabbing the title from Yahoo.com; seems easy but is it?

Post by numbers369 »

Everah | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


I've searched the forum and browsed some stickies and haven't found the solution.  I've also searched google for much of the day to no avail.  

I want to be able to grab (scrape) code at a particular place in a page.  I want to hard code the start and stop and use regex to find the inbetween.  [Yes, I realize the page can be modified rendering my hard coding worthless.]  As a very basic example, I'm trying to grab the title of Yahoo.com's site using the following code.

Code: Select all

<?php

$data = file_get_contents('http://www.yahoo.com');
$regex = '/<title> (.+?) <\\/title>/';
preg_match($regex,$data,$match);
var_dump($match);
print '<br><br>';
echo $match[1];

?>


$match[1] comes back empty.
i have done a print $data command and there is info in the string

the regex is very simple. i added the two \\ because of the / in </title>. I believe it's because / is a delimiter
the (.+?) is supposed to grab whatever is inbetween <title> and </title>

I've used this exact code and it works when <title> and </title> are replaced with static text on a page. As an example, see the code below which locates the word "Yahoo!" in between 2007 and Inc on the footer of the site:

Code: Select all

<?php

$data = file_get_contents('http://www.yahoo.com');
$regex = '/2007 (.+?) Inc/';
preg_match($regex,$data,$match);
var_dump($match);
print '<br><br>';
echo $match[1];

?>


any ideas?


Everah | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Answer this, and it will answer your question

What is the difference between

Code: Select all

<title>foobar</title>
and

Code: Select all

<title> foobar </title>
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

also, you appear to have an extra backslash in your pattern.
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Post by neophyte »

With the abundance of REST and SOAP interfaces available on Yahoo why are we grep'n pages with regex?
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

Code: Select all

$title = "Yahoo!";
You're welcome.
Post Reply