Page 1 of 1

preg_match - match data with an "anything except"

Posted: Sat May 27, 2006 2:59 pm
by revof11
So after my previous post today was answered (viewtopic.php?t=49115), I decided to continue moving on my quest of formatting/isolating string-based options in my script. Now that I have all the "tag" strings isolated, I want to extract the specified options from within them. I have made some preg attempts, but all have failed me.

Let us take the sample string of:

Code: Select all

blablabla[n option1=http://www.google.com option2=blabla]this is my string[/n]blablabla[n option2="bla2"]this is another string[/n]blablabla
Let us assume that the following strings have been isolated:

Code: Select all

1.  [n option1=http://www.google.com]this is my string[/n]
2.  [n option2="bla2"]this is another string[/n]
I want to extract the option1 and option2 data.
So what I want to say is this:
Search for any character string followed by an equal sign following by a data string (which can be optionally surrounded by double quotes) and shove this data into an associative array with "key" as the association array key and the data string as the value within it

Using the following tcode is what I thought would isolate each option for me (needless to say, it didn't):

Code: Select all

preg_match_all('~([\w]+=["]??[\w:/\.\']["]??+)~', $str, $match);
The closest I have come is:

Code: Select all

preg_match_all('~([\w]+=[\w:/\.\']+)~', $str, $match);
echo $match[0][0]
However, that only gives me "option1=http://www.google.com" and "option2="bla2"".

I am COMPLETELY confused and digging through regex and PHP docs hasn't helped me too much (not sure if I'm having a brain-dead day or I'm just not grasping the preg options). Any help/solution (with a brief explanation) would be greatly appreciated.

If I can at least get the option=whatever strings it should be easy enough to split...

To illustrate a little better, this "almost" does what I need it to:

Code: Select all

while ( strpos($str, '=', $lastOffset) !== false)
    {
      // key
      $pos = strpos($str, '=', $lastOffset);
      for ($keyPos = $pos; $keyPos >= 0; $keyPos--)
      {
        if ( strcmp($str[$keyPos], " ") == 0 )
          break;
      }
      $key = substr($str, $keyPos, $pos - $keyPos);
    
      // value
      for ($valuePos = $pos; ; $valuePos++)
      {
        if ( strcmp($str[$valuePos], " ") == 0 || strcmp($str[$valuePos], "]") == 0 )
          break;
      }
      $value = substr($str, $pos + 1, $valuePos - $pos - 1);
    
      // set
      $lastOffset = $pos + 1;
      $data[$key] = $value;
    }

Posted: Sat May 27, 2006 4:35 pm
by sweatje
One thing you can't do is extract both the tags and the options in one pass. You already know how to grab the tags, so now you just need to loop over the tags and extract the options. Some code like this might help:

Code: Select all

function testExtractOptionsFromBbcodeishArray() {
	$data = array('[n option1=http://www.google.com]this is my string[/n]'
		,'[n option2="bla2"]this is another string[/n]');
	$keys = array('option1', 'option2');
	$values = array('http://www.google.com','"bla2"');
	foreach($data as $num => $tag) {
		preg_match_all('~\[\w+(?:\s+(\w+)=((?:(?!\])\S)+))+~', $tag, $match);
		$this->assertEqual($keys[$num], $match[1][0]);
		$this->assertEqual($values[$num], $match[2][0]);
	}
}
HTH

Posted: Sat May 27, 2006 5:52 pm
by revof11
Good catch... "BBCode-ish" is right.
That nice, simple syntax inspired this little project.

I shall post with my finalized code.
What you provided is a great pick up to where I was caught.

Thanks.

Posted: Sat May 27, 2006 6:21 pm
by revof11
OK... I have everything working with one snag (which is acceptable at this stage for me).
The snag is that I can't have spaces inbetween double quotes.

However... this seems to work just fine:

Code: Select all

public static final function extractOptionsFromBbcodeishArray($str)
{
  $data = array();

  // do it
  $temp = $str;  
  while ( strstr($temp, '=') !== false )
  {
    // lookup & insert
    preg_match_all('~\[\w+(?:\s+(\w+)=((?:(?!\])\S)+))+~', $temp, $match);
    $key = $match[1][0];
    $value = str_replace('_', ' ', trim($match[2][0], " \""));
    $data[$key] = $value;

    // strip the located key/value pair
    $temp = str_replace($key . '=' . $match[2][0], '', $temp);
  }

  // exit
  return $data;
}


I'll work through the little snag later, but it seems to just be adding a "? in there along with altering the ...\])\S)+ portion.

Thanks so much for your help! I'll probably mull over that snag for a day or two, but I'm sure I'll eventually get it. If not, just forcing use of underscores is fine for me (since this is really for something that only I'll be using anyway). The funny thing is that I can write this and get it working in Java, but Java does doesn't fit the needs of what I want to do (neither does C++ or Perl for that matter).

Once again: much thanks.

Posted: Sat May 27, 2006 7:16 pm
by sweatje
I didled with it a little bit, and got most of the way there. If you notice the last capture, it is missing the final " from the first option in the tag.

Code: Select all

function testExtractOptionsFromBbcodeishArray() {
	$data = array('[n option1=http://www.google.com]this is my string[/n]'
		,'[n option2="bla2"]this is another string[/n]'
		,'[n option2="now with spaces"]this is another string[/n]'
		,'[n option2="bla2" opt3=foobar]this is another string[/n]');
	$keys = array('option1', 'option2', 'option2', 'option2');
	$values = array('http://www.google.com','"bla2"', '"now with spaces"','"bla2');
	foreach($data as $num => $tag) {
		$re_opts_in_tag = '~\[\w+\b([^\]]*)\]~';
		preg_match($re_opts_in_tag, $tag, $match_opts);
		preg_match_all('~(?:(\w+)=((?:.(?!\]|\s*\w+=))+)\s*)+~', trim($match_opts[1]), $match);
		$this->assertEqual($keys[$num], $match[1][0]);
		$this->assertEqual($values[$num], $match[2][0]);
	}
	$this->dump($match);
	$this->assertEqual('opt3', $match[1][1]);
	$this->assertEqual('foobar', $match[2][1]);
}
That last match looks like:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => option2="bla2
            [1] => opt3=foobar
        )

    [1] => Array
        (
            [0] => option2
            [1] => opt3
        )

    [2] => Array
        (
            [0] => "bla2
            [1] => foobar
        )

)

Posted: Sat May 27, 2006 7:33 pm
by revof11
8O
Jeez... rock on, man.
Talk about above and beyond the call of duty...