Page 1 of 1

Count all quote marks outside of <>

Posted: Sat Oct 04, 2008 5:38 am
by batfastad
Hi guys
I've been getting to grips with regular expressions over the past few weeks... something I've avoided for a couple of years. I'm finding them incredibly useful.
I've been following the basic regex tutorial on here written by Chris Corbyn, but I think this is calling for something a little more advanced.
This one has got me completely stumped.

I'm looking to count the number of double quote marks " in a variable that are outside a <> pair.
It's for a basic intranet-based CMS for our work website and it's one of the simple steps I'm taking to make sure the users enter at least structurally valid HTML.

So far I'm a long way off:

Code: Select all

echo preg_match_all('/<[^>]*/', $var, $matches);
Which gives me a count of the number of <> pairs, not really what I'm looking for!

I've been using The Regex Coach, which is a really helpful utility for testing but I've hit a bit of a dead end on this.

I thought of using preg_split on the above which would give me an array of all the text outside of the <> pairs, and then I loop through the array and count the " characters.
But I'm sure there's a better way using regex :D

Any ideas/suggestions?

Thanks, B

Re: Count all quote marks outside of <>

Posted: Sat Oct 04, 2008 6:45 am
by prometheuzz
Have a look at this:

Code: Select all

<?php
 
$text = '<tag id="123" attr="aaaa"> some "text" and more "text inside 
tags and a new line"</tag> bla bla <tag id="1" attr="x"> more "text" and more 
"text inside tags" text</tag>';
 
$regex = '/"[^"<>]+"(?=[^<>]*<)/';
 
if(preg_match_all($regex, $text, $matches)) {
    echo "$text\n\n";
    echo "Found " . sizeof($matches[0]) . " matches:\n\n";
    print_r($matches[0]);
}
 
/* output:
<tag id="123" attr="aaaa"> some "text" and more "text inside 
tags and a new line"</tag> bla bla <tag id="1" attr="x"> more "text" and more 
"text inside tags" text</tag>
 
Found 4 matches:
 
Array
(
    [0] => "text"
    [1] => "text inside 
tags and a new line"
    [2] => "text"
    [3] => "text inside tags"
)
*/
?>

Re: Count all quote marks outside of <>

Posted: Sat Oct 04, 2008 8:13 am
by batfastad
That's perfect! Seems to work a treat.
Now I've got to analyse it and work out how it does it :D

Thanks so much ;)

Re: Count all quote marks outside of <>

Posted: Sat Oct 04, 2008 8:21 am
by prometheuzz
batfastad wrote:That's perfect! Seems to work a treat.
Now I've got to analyse it and work out how it does it :D

Thanks so much ;)
You're welcome. Don't hesitate to ask for clarification.
To get you started: the (?=...) is called "positive look ahead". Read more about it here:
http://www.regular-expressions.info/lookaround.html

I also recommend to bookmark that site (http://www.regular-expressions.info): it's a first class online resource!

Re: Count all quote marks outside of <>

Posted: Sat Oct 04, 2008 8:35 am
by batfastad
Actually there does appear to be a slight problem.

The regex only matches the quotes " in pairs
So if I just drop a single " into the variable, it doesn't get counted

Is there any fix for that?

EDIT: Ok I think I fixed it... deleting the leading " from the pattern to make it:

Code: Select all

[^"<>]+"(?=[^<>]*<)
seems to do the job ;)

Thanks again, B

Re: Count all quote marks outside of <>

Posted: Tue Oct 07, 2008 3:47 am
by batfastad
And in fact there's an easier way I found to do this using strip_tags() then counting the "
Hope this helps someone out

Thanks, B