Count all quote marks outside of <>

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Count all quote marks outside of <>

Post by batfastad »

Hi guys
I've been getting to grips with regular expressions over the past few weeks... something I've avoided for a couple of years. I'm finding them incredibly useful.
I've been following the basic regex tutorial on here written by Chris Corbyn, but I think this is calling for something a little more advanced.
This one has got me completely stumped.

I'm looking to count the number of double quote marks " in a variable that are outside a <> pair.
It's for a basic intranet-based CMS for our work website and it's one of the simple steps I'm taking to make sure the users enter at least structurally valid HTML.

So far I'm a long way off:

Code: Select all

echo preg_match_all('/<[^>]*/', $var, $matches);
Which gives me a count of the number of <> pairs, not really what I'm looking for!

I've been using The Regex Coach, which is a really helpful utility for testing but I've hit a bit of a dead end on this.

I thought of using preg_split on the above which would give me an array of all the text outside of the <> pairs, and then I loop through the array and count the " characters.
But I'm sure there's a better way using regex :D

Any ideas/suggestions?

Thanks, B
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Count all quote marks outside of <>

Post by prometheuzz »

Have a look at this:

Code: Select all

<?php
 
$text = '<tag id="123" attr="aaaa"> some "text" and more "text inside 
tags and a new line"</tag> bla bla <tag id="1" attr="x"> more "text" and more 
"text inside tags" text</tag>';
 
$regex = '/"[^"<>]+"(?=[^<>]*<)/';
 
if(preg_match_all($regex, $text, $matches)) {
    echo "$text\n\n";
    echo "Found " . sizeof($matches[0]) . " matches:\n\n";
    print_r($matches[0]);
}
 
/* output:
<tag id="123" attr="aaaa"> some "text" and more "text inside 
tags and a new line"</tag> bla bla <tag id="1" attr="x"> more "text" and more 
"text inside tags" text</tag>
 
Found 4 matches:
 
Array
(
    [0] => "text"
    [1] => "text inside 
tags and a new line"
    [2] => "text"
    [3] => "text inside tags"
)
*/
?>
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Count all quote marks outside of <>

Post by batfastad »

That's perfect! Seems to work a treat.
Now I've got to analyse it and work out how it does it :D

Thanks so much ;)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Count all quote marks outside of <>

Post by prometheuzz »

batfastad wrote:That's perfect! Seems to work a treat.
Now I've got to analyse it and work out how it does it :D

Thanks so much ;)
You're welcome. Don't hesitate to ask for clarification.
To get you started: the (?=...) is called "positive look ahead". Read more about it here:
http://www.regular-expressions.info/lookaround.html

I also recommend to bookmark that site (http://www.regular-expressions.info): it's a first class online resource!
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Count all quote marks outside of <>

Post by batfastad »

Actually there does appear to be a slight problem.

The regex only matches the quotes " in pairs
So if I just drop a single " into the variable, it doesn't get counted

Is there any fix for that?

EDIT: Ok I think I fixed it... deleting the leading " from the pattern to make it:

Code: Select all

[^"<>]+"(?=[^<>]*<)
seems to do the job ;)

Thanks again, B
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Re: Count all quote marks outside of <>

Post by batfastad »

And in fact there's an easier way I found to do this using strip_tags() then counting the "
Hope this helps someone out

Thanks, B
Post Reply