Page 1 of 1

pulling info from database and removing html

Posted: Fri Dec 16, 2005 11:59 am
by Luke
I have a news database with 3 columns... id, author, timestamp, and post - id & timestamp for obvious reasons, author so there is somebody to blame is something is falsely published, and post for the actual news post. Now... I currently have it set up so that it will accept html.

I am wondering if anybody knows how I can extract like 8 words from the beginning of the post column and turn

Code: Select all

<a href="#asdf">site map</a>
into just "site map" to use as the heading of the news post.

EDIT: OK after reading that... even I was confused by it so here's what I want...

If this is in the news post column...

Code: Select all

Welcome to the new <a href="http://www.paradisedirect.com">Paradise Direct</a>! We are making huge changes in our web design and hosting department. To find out how these changes can benefit you, visit our services page!
This is what I want it to turn into...
Welcome to the new Paradise Direct!
Welcome to the new Paradise Direct! We are making huge changes in our web design and hosting department. To find out how these changes can benefit you, visit our services page![/b]

Posted: Fri Dec 16, 2005 12:30 pm
by Burrito
I don't think there is an easy way to just strip out all html.

what you should do is create an array of patterns and use regex to preg_replace() all of the html you want stripped.

see your other topic for where to start with regex 8)

Posted: Fri Dec 16, 2005 12:43 pm
by Luke
yes... the other topic in general discussion answered my question very nicely... it worked seemlessly, but upon looking at the results... I have decided maybe to change it to extract the first sentence.... here is what I have...

Code: Select all

$headline = substr(htmlentities(preg_replace('#<.*?(\s+[\w\W]+?(\s*=\s*([\'"]?).*?\\3))*?>#s','',$array['news_post']),ENT_QUOTES), 0, 35);
and here is what it produces...
The Paradise Chamber of Commerce wi...
Tuesday, November 22nd, 2005
The Paradise Chamber of Commerce will be sponsoring a "Good Morning Paradise" networking meeting at Sierra Tech on Wednesday, December 14th at 7:30am. Everyone is welcome! Bring lots of business cards, a gift to raffle if desired, and be prepared to advertise your business. Coffee and donuts will be provided. See you there!
I just don't know how to modify it to extract the first sentence... I am looking into some regex tutorials, but in the mean-time... anybody know how this can be done?

If you want to see the page this is working on here it is...
http://sierra-tech.com/index.php

Posted: Fri Jun 30, 2006 2:05 pm
by Luke
Sorry to bump an old thread, but uhh... anybody know how I could accomplish this? Pulling the first sentence out of a paragraph of text.

EDIT: (I figured it out)

Code: Select all

preg_match("/^[^\.?!]*[\.?!]+/", $article, $matches)
That is my solution... seems to work good! :D :D