Regular Expression returning empty Array ( )

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Miteshsach86
Forum Newbie
Posts: 7
Joined: Thu Oct 07, 2010 4:41 am

Regular Expression returning empty Array ( )

Post by Miteshsach86 »

Hi fellow developers,

I'm having a real problem at the moment, I'm trying to capture everything in between <body></body> tags using the following code but it does not print anything:

Code: Select all

$lines = file("http://www.bbc.co.uk/");

foreach ($lines as $line_num => $line) {
$thecontent .= htmlspecialchars($line) . "<br />\n";
}
preg_match('/<body.*?>(.*?)<\/body >/', $thecontent, $htmltext);
$moretext = $htmltext[1];
echo $moretext;
When you do place a "print($thecontent);" into the code the entire html for [whatever the website] does display but I want to capture only the html code in between the body tags. I've tried everything but I just can't get this to work. :? :banghead:

I would appreciate anyone's help and I'd like to thank you in advance.

M
Last edited by Benjamin on Thu Oct 07, 2010 5:04 am, edited 1 time in total.
Reason: Added [syntax=php] tags.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Regular Expression returning empty Array ( )

Post by Benjamin »

.* doesn't match new lines.

Code: Select all

preg_match('#<body[^>]+>([\s\S]*)<\s{0,1}/body>#i', $thecontent, $htmltext);
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Regular Expression returning empty Array ( )

Post by requinix »

Benjamin wrote:.* doesn't match new lines.
...by default. Add the 's' flag and it will.
Miteshsach86
Forum Newbie
Posts: 7
Joined: Thu Oct 07, 2010 4:41 am

Re: Regular Expression returning empty Array ( )

Post by Miteshsach86 »

Hi Guys,

Thanks for your reply.. Unfortunately that's also giving me an empty "Array ( )" :(

Is there anything else that you think I'm doing wrong? :?

M
User avatar
twinedev
Forum Regular
Posts: 984
Joined: Tue Sep 28, 2010 11:41 am
Location: Columbus, Ohio

Re: Regular Expression returning empty Array ( )

Post by twinedev »

The problem is that you are using the htmlspecialchars() on the data before you are doing the regular expression. (This is in addition to the needing the s option at the end of the expression to allow newlines to be matched)

Two choices here:

1. Change your expression to be:

Code: Select all

preg_match('/<body.*?>(.*?)<\/body>/s', $thecontent, $htmltext);
2. (my recommendation), wait until after you have captured it before converting it, and then in that case you just needed the s to the end of the expression for new lines (IMO, always best to work with as much original "raw" data as possible, only convert right before needing it converted).

Also a note, when I just copied and pasted the code you posted here, there was a space between </body and the closing >. There shouldn't be one. If the code you are grabbing may have one by mistake end the search with </body.*?>

-Greg
Miteshsach86
Forum Newbie
Posts: 7
Joined: Thu Oct 07, 2010 4:41 am

Re: Regular Expression returning empty Array ( )

Post by Miteshsach86 »

Thanks for all your help guys!

Much appreciated :)
Post Reply