Replace quotes with curly quotes

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Replace quotes with curly quotes

Post by someberry »

I am trying to replace quotes with nice curly quotes, however, I have run into a slight hiccup.

I am parsing whole articles of text which contain HTML; this is where the problem lies. I want to replace quotes, but leave quotes in tags alone. If the following text is what I was parsing:

Code: Select all

<span class="strong">"Hello"</span> he said, "My name is someberry.".
I would only want to match:

Code: Select all

"Hello"
"My name is someberry."
I tried using the expression:

Code: Select all

'#(?<!=)"(.*?)"(?!>)#i'
However it seems to ignore the last negative lookahead.

Anyone have any ideas why, or a better expression I could use as an alternative? :-)

P.S. I know this current expression as it is isn't fullproof, however it doesn't need to be.
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

For simple cases the following would be sufficient

Code: Select all

$re = '~ " [^"<>]+ " (?= [^<>]* ($ | <) )~x';
however, I don't think regexp is a right tool for processing html. Consider using a parser, e.g. HTML_SAX.
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Post by someberry »

stereofrog wrote:For simple cases the following would be sufficient

Code: Select all

$re = '~ " [^"<>]+ " (?= [^<>]* ($ | <) )~x';
however, I don't think regexp is a right tool for processing html. Consider using a parser, e.g. HTML_SAX.
Hmm, I just tried it, and it is not exactly what I was looking for. I would like to do this:

Code: Select all

preg_replace(
            '#(?<!=)"(.*?)"(?!>)#i',
            '“\\1”',
            $code);
Edit: made quite a big typo, added word in bold was what I meant to say.
Last edited by someberry on Wed Jun 06, 2007 7:57 am, edited 2 times in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Make sure to use HTML entities, not the character equivalents.
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Post by someberry »

feyd wrote:Make sure to use HTML entities, not the character equivalents.
Of course, however phpBB parses them into plain text characters it would seem. :roll:
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Only if you use the numeric form. The named for is what I was referring to.

&ldquo; and &rdquo;
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Post by someberry »

feyd wrote:Only if you use the numeric form. The named for is what I was referring to.

&ldquo; and &rdquo;
Either way, the OP's question is still unanswered :)
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

I'd suggest OP reads the answers he's got and tries to adapt them on his own, if OP waits for the complete code, from my side this is not going to happen. ;)
Post Reply