What's left after strip_tags ?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
impulse()
Forum Regular
Posts: 748
Joined: Wed Aug 09, 2006 8:36 am
Location: Staffordshire, UK
Contact:

What's left after strip_tags ?

Post by impulse() »

I'm writing a script that connects to forum thread, grabs all the output and then strips the tags to leave me with just the text from the website. I hoping to make an RSS feed out of this once everything if complete. The only problem is that after strip_tags() has been used I have masses of empty lines. I'm not entirely sure how to remove just those lines. I've tried removing any '\n' but that also removes the legitimate new lines.

If somebody has tried something similar to this before then could you point me in the right direction with string functions. The best idea I have at the moment it to put the text into an array, loop through every element and for the first line with '\n' that is found, create a variable, then if the next line is also '\n' and the variable is set then remove that line.

Any other suggestions very welcome.
impulse()
Forum Regular
Posts: 748
Joined: Wed Aug 09, 2006 8:36 am
Location: Staffordshire, UK
Contact:

Post by impulse() »

strlen() has just reveled what I wasn't expecting....
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

preg_replace. Find all newline sequences greater than or equal to 2, and then replace it with a single newline.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: What's left after strip_tags ?

Post by s.dot »

impulse() wrote:I'm writing a script that connects to forum thread, grabs all the output and then strips the tags to leave me with just the text from the website. I hoping to make an RSS feed out of this once everything if complete.
Do you have permission to do this from the web site owner? :P
impulse() wrote:The only problem is that after strip_tags() has been used I have masses of empty lines. I'm not entirely sure how to remove just those lines. I've tried removing any '\n' but that also removes the legitimate new lines.
strip_tags() may also remove legitimate data if the HTML is malformed, I think feyd has posted a "smart" strip_tags() function.
impulse() wrote:If somebody has tried something similar to this before then could you point me in the right direction with string functions. The best idea I have at the moment it to put the text into an array, loop through every element and for the first line with '\n' that is found, create a variable, then if the next line is also '\n' and the variable is set then remove that line.

Any other suggestions very welcome.
The guy above me hit the nail right on the head. ;)

Code: Select all

$text = preg_replace('/' . preg_quote(PHP_EOL) . '{2,}/m', PHP_EOL, $text);
Something like that. :) I'm not sure you want to use PHP_EOL, but I've never dealt with \n in a pattern before.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
impulse()
Forum Regular
Posts: 748
Joined: Wed Aug 09, 2006 8:36 am
Location: Staffordshire, UK
Contact:

Post by impulse() »

Do you have permission to do this from the web site owner?
I didn't realize it could be a problem, with the contents being available publically.

I've tried your suggestion and also included the "smart" strip-tags from the 'Useful Posts' forum but it only removes the html tags.
The text goes through these 2 functions before I echo out the output:

Code: Select all

$text = preg_replace('#</?.*?>#','',$text);
$text = preg_replace('/' . preg_quote(PHP_EOL) . '{2,}/m', PHP_EOL, $text);
Was hoping you could explain the parameters passed to preg_replace.
Post Reply