preg_match_all unclosed tag problem

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
rtnylw
Forum Newbie
Posts: 4
Joined: Thu Sep 10, 2009 3:00 am

preg_match_all unclosed tag problem

Post by rtnylw »

Hi,
I'm working on a website scrape script. I need to get just a text between the tags, the problem is that some of the tags are not closed. Here's an example:

Code: Select all

<tag>some text</tag>
<tag>more text
<tag>even more text</tag>
The second opening tag, does not have a closing tag. I was thinking this expression might work:

Code: Select all

#<tag>(.*?)#is
But nothing gets returned. I guess the preg_match needs a 'closing' identifier or probably I'm just way off track. Any guidance is much appreciated. Thanks!
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: preg_match_all unclosed tag problem

Post by requinix »

Code: Select all

#<tag>(.*?)</tag>#is
There's only so far you can go to deal with bad input.


Here's what happens if I post that sample you gave using code tags.

Code: Select all

some text

Code: Select all

more text

Code: Select all

even more text
rtnylw
Forum Newbie
Posts: 4
Joined: Thu Sep 10, 2009 3:00 am

Re: preg_match_all unclosed tag problem

Post by rtnylw »

So there aren't any solutions to this problem?
User avatar
turbolemon
Forum Commoner
Posts: 70
Joined: Tue Jul 14, 2009 6:45 am
Location: Preston, UK

Re: preg_match_all unclosed tag problem

Post by turbolemon »

You could try running the input through tidy or html purifier?

Tidy
http://www.w3.org/People/Raggett/tidy/ (Description)
http://uk.php.net/tidy (PHP Extension)

HTML Purifier
http://htmlpurifier.org/
Post Reply