Tricky Extration of Attributes Valid and otherwise
Posted: Thu Jan 11, 2007 11:47 am
I need to extract data from a string such as this:Into a format like this:
Listing 1:Or it may be easier to extract it like this
Listing 2:
This is a regex I have that will do the $valid part of listing 2:
And this was my attempt at doing the invalid part but it seems to capture the valid stuff again too:(?!) is supposed to be a negative look ahead.
Here's my complete code so far:
If someone could explain how this works, in particular how the condition is evaluated:That may help
Code: Select all
{ tak foo="1" gaz foo="2" # bar="zim" gir='dib'Listing 1:
Code: Select all
$result = array(
'{' => '',
'tak' => '',
'foo' => array(1, 2), // there are two foo attributes ^^
'gaz' => '',
'#' => '',
'bar' => 'zim',
'gir' => 'dib'
);Listing 2:
Code: Select all
$valid = array(
'foo' => array(1, 2),
'bar' => 'zim',
'gir' => 'dib'
);
$invalid = array('{', 'tak', 'gaz', '#');Code: Select all
~ (?P<name>[a-z][a-z0-9]*?)=(?: "(?P<doubleValue>[^"]*?)" | '(?P<singleValue>[^']*?)' ) ~ixCode: Select all
~([\S]+)(?!=")~Here's my complete code so far:
Code: Select all
/**
* Parse a tag for its attribute names and values
* Can handle multiple attributes with same name, stacks up the values
*
* @param string $toMatch
* @return array keys as names values as values
*/
private function _attribute($toMatch)
{
$match = array();
$pattern = <<< PCRE
~ (?P<name>[a-z][a-z0-9]*?)=(?: "(?P<doubleValue>[^"]*?)" | '(?P<singleValue>[^']*?)' ) ~ix
PCRE; // fixes devnet's bad heredoc highlighting >> "
preg_match_all($pattern, $toMatch, $match);
$attributes = array();
// This loop builds: array('name' => 'value') structure
for ($i = 0, $j = count($match['name']); $i < $j; ++$i) {
$name = $match['name'][$i];
if (empty($match['doubleValue'][$i])) { // which quote type
$value = trim($match['singleValue'][$i]);
} else {
$value = trim($match['doubleValue'][$i]);
}
if (isset($attributes[$name])) { // duplicate attribute
if (is_array($attributes[$name])) { // array the values up
$attributes[$name][] = htmlspecialchars($value, ENT_QUOTES);
} else {
$attributes[$name] = array($attributes[$name], htmlspecialchars($value, ENT_QUOTES));
}
} else {
$attributes[$name] = htmlspecialchars($value, ENT_QUOTES);
}
}
// Invalid attributes, (?!) is negative look ahead
$pattern = '~([\S]+)(?!=")~';
$match = array();
preg_match_all($pattern, $toMatch, $match);
echo '<pre>';
print_r($match); // in process of debugging
echo '</pre>';
/**
* @todo add invalid matches to $attributes
*/
return $attributes;
}Code: Select all
(?(condition)yes-regex|no-regex)