Page 1 of 1

Grabbing the title from Yahoo.com; seems easy but is it?

Posted: Fri Oct 19, 2007 9:37 pm
by numbers369
Everah | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


I've searched the forum and browsed some stickies and haven't found the solution.  I've also searched google for much of the day to no avail.  

I want to be able to grab (scrape) code at a particular place in a page.  I want to hard code the start and stop and use regex to find the inbetween.  [Yes, I realize the page can be modified rendering my hard coding worthless.]  As a very basic example, I'm trying to grab the title of Yahoo.com's site using the following code.

Code: Select all

<?php

$data = file_get_contents('http://www.yahoo.com');
$regex = '/<title> (.+?) <\\/title>/';
preg_match($regex,$data,$match);
var_dump($match);
print '<br><br>';
echo $match[1];

?>


$match[1] comes back empty.
i have done a print $data command and there is info in the string

the regex is very simple. i added the two \\ because of the / in </title>. I believe it's because / is a delimiter
the (.+?) is supposed to grab whatever is inbetween <title> and </title>

I've used this exact code and it works when <title> and </title> are replaced with static text on a page. As an example, see the code below which locates the word "Yahoo!" in between 2007 and Inc on the footer of the site:

Code: Select all

<?php

$data = file_get_contents('http://www.yahoo.com');
$regex = '/2007 (.+?) Inc/';
preg_match($regex,$data,$match);
var_dump($match);
print '<br><br>';
echo $match[1];

?>


any ideas?


Everah | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]

Posted: Fri Oct 19, 2007 10:39 pm
by John Cartwright
Answer this, and it will answer your question

What is the difference between

Code: Select all

<title>foobar</title>
and

Code: Select all

<title> foobar </title>

Posted: Sat Oct 20, 2007 3:39 am
by Kieran Huggins
also, you appear to have an extra backslash in your pattern.

Posted: Sat Oct 20, 2007 7:46 am
by neophyte
With the abundance of REST and SOAP interfaces available on Yahoo why are we grep'n pages with regex?

Posted: Sat Oct 20, 2007 9:10 am
by aaronhall

Code: Select all

$title = "Yahoo!";
You're welcome.