Page 1 of 1

Advanced regex help

Posted: Tue Jan 31, 2006 6:32 am
by someberry
First of all, great set of explanations d11wtq, they have helped me a great deal. I am not sure if this can be done in regex - I cant figure out how to do it, but I'm betting it can :P

I want to allow spaces in a string (and it has to be done in regex, we arent supposed to use PHP to explode() it and such), only as long as they are surrounded by quotation marks. This is for a mathmatics question.

So for instance, I have the string:

Code: Select all

foobar"foo bar, foo bar"foobar
I need it so it returns something like this into the array:

Code: Select all

Array
(
   [0] => foobar
   [1] => foo bar, foo bar
   [2] => foobar
)
Does anyone know of a way in which this could be achieved?
Thank you!

Posted: Tue Jan 31, 2006 6:52 am
by raghavan20

Code: Select all

<pre>
<?php
$inputString = "foobar\"foo bar, foo bar\"foobar";
preg_match_all("/((.*?)[\"]{1}|(.){1,}$)/", $inputString);
print_r($matches);
?>
</pre>
I tried this...but I do not know how to take the trailing double quote off....and someone can help me understand why there are four elements returned? :roll:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => foobar"
            [1] => foo bar, foo bar"
            [2] => foobar
        )

    [1] => Array
        (
            [0] => foobar"
            [1] => foo bar, foo bar"
            [2] => foobar
        )

    [2] => Array
        (
            [0] => foobar
            [1] => foo bar, foo bar
            [2] => 
        )

    [3] => Array
        (
            [0] => 
            [1] => 
            [2] => r
        )

)

Posted: Tue Jan 31, 2006 6:54 am
by Chris Corbyn
It is indeed possible :)

The example you gave is actually pretty simple using a split:

Code: Select all

$parts = preg_split("/''/", $string);
Doing cleverer things with quoted string detection in regex is a bit more complicated but I'm not sure you need to see that (unless you want to).

Posted: Tue Jan 31, 2006 7:23 am
by jayshields
Wouldn't

Code: Select all

$array = explode($string, '"');
be easier?

Posted: Tue Jan 31, 2006 8:05 am
by Chris Corbyn
jayshields wrote:Wouldn't

Code: Select all

$array = explode($string, '"');
be easier?
I think someberry needs a more generic answer using regex although there are split() and explode() type functions in just about all languages.

someberry >> apologies for my answer LOL... I completely missed the point :oops:

This is untested but should work... although it does look a bit scary.

Code: Select all

<?php

//Recursive
function tokenize($str, $ret=array())
{
    $re = '/(?:"(.*?)")/s';
    if ($offset = preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE))
    {
        $substr = substr($str, 0, $offset);
        $parts = preg_split('/\s+/', $str);
        foreach ($parts as $p) $ret[] = $p;
        $ret[] = trim($matches[1]);
        $str = substr($str, $offset);
        return tokenize($str, $ret)
    }
    else
    {
        $parts = preg_split('/\s+/', $str);
        foreach ($parts as $p) $ret[] = $p;
        return $p;
    }
}

?>

Posted: Tue Jan 31, 2006 8:18 am
by Chris Corbyn
Revised....

Code: Select all

<?php

//Recursive
function tokenize($str, $ret=array())
{
    $re = '/(?:"(.*?)")/s';
    if (preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE))
    {
        $offset = $matches[0][1];
        $substr = substr($str, 0, $offset);
        $parts = preg_split('/\s+/', $substr);
        foreach ($parts as $p) if (!empty($p)) $ret[] = $p;
        $ret[] = trim($matches[1][0]);
//        print_r($matches);
        $str = substr($str, ($offset+strlen($matches[0][0])));
        return tokenize($str, $ret);
    }
    else
    {
        $parts = preg_split('/\s+/', $str);
        foreach ($parts as $p) if (!empty($p)) $ret[] = $p;
        return $ret;
    }
}

$str = 'this has a space in it "but this does not" here';
print_r(tokenize($str));

?>

Posted: Tue Jan 31, 2006 8:38 am
by feyd

Posted: Tue Jan 31, 2006 9:07 am
by someberry
Hmm, a little bit of a different responce than I expected, I guess I should have made myself more clear.

What I really needed was a exp that found certain strings, in which some places are allowed spaces, and some places aren't. For example:

Code: Select all

A test equation to find ((12/15)*("pie")/("diameter"-8)+("circumfrance"*2.5)) a answer to a equation I just made up on the spot.
And the exp I am currently using:

Code: Select all

#\([\w\W]*?((?:\([\w\W]*?\)).*?){0,}\)#
Which finds:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => ((12/15)*("pie")/("diameter"-8)+("circumfrance"*2.5))
        )

    [1] => Array
        (
            [0] => ("circumfrance"*2.5)
        )

)
Which is perfect for [0][0]. However, I am now stuck with the bit where you can have spaces inbetween the quotes, but nowhere else in the equation, so:

Code: Select all

Would NOT be accepted as it has spaces outside of the quotes:
( ( 12 / 15 ) * ( "pie" ) / ( "diameter" - 8 ) + ( "circumfrance" * 2.5 ) )

Would be accepted as there are no outside spaces, save for inbetween the quotes:
((12/15)*("pie")/("dia me t er"-8)+("ci rc umf  ra   nce"*2.5))
Let me just remind you that it needs to all be in the regex expression with no outside PHP help (unless it is absolutely necessary!).

Hope I'm not loosing you.
Thanks.