Page 1 of 1

Regex pattern returning html tag attributes as array

Posted: Mon Oct 27, 2003 9:59 pm
by marcoBR
Is it possible to write a pattern to match a html tag with indeterminate number of attributes and return these attributes as array with attribute name as key and value as value???

e.g.: <user_data name="Ben" email="ben@ben" ... /> will return $user_data = array('name'=>'Ben', 'email'=>'ben@ben');

Code: Select all

$tag = '<user_data name="Ben" email="ben@ben" />';
preg_match('/<user_data\s+((.*)=(.*))*?\s*\/>/', $tag, $attribs);
echo '<pre>';
print_r($attribs);
echo '</pre>';
I have tried the code above, but it's not working as intended.

If it'is impossible to achieve above array using only a single pattern, the array below also is accepted, thus I can use it to build an array as intended.

Code: Select all

Array
(
    &#1111;0] =&gt; 'name'  
    &#1111;1] =&gt; 'Ben'
    &#1111;2] =&gt; 'email'
    &#1111;3] =&gt; 'ben@ben' 
)
Another any approach are welcome too...

Posted: Mon Oct 27, 2003 11:10 pm
by m3rajk
i built a funtion for something rather similar using the preg_split() function.

if you already have an attempt to do what you want with that, post it and i'll give you hints till you've succeded so you'll learn it. otherwise i suggest reading the link, then asking any questions necessary for clarification, because you can build a function to do it using preg_split.

Posted: Tue Oct 28, 2003 11:58 am
by marcoBR
Thank you for your hint m3rajk, now I got it working...

Code: Select all

$attribs = 'name="Ben" email="ben@ben"';
$attribs_parts = preg_split('/[=|"|''+(.*?)"|''+]/is', $attribs, -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>'; 
print_r($attribs_parts); 
echo '</pre>';
/*Results
Array ( 
  [0] => name 
  [1] => Ben 
  [2] => email 
  [3] => ben@ben
)
*/
Two new problems now:

1) If user omit quotes(" or ') to enclose the attribute value, it isn't work properly. e.g.: name=Ben email="ben@ben"

2) I user use any delimiter in the attribute value(=," or '), it isn't work properly as well. e.g.: name="Ben's Jr." email="ben@ben" compare="a=b"

Any trick to solve these problems???

Posted: Tue Oct 28, 2003 11:07 pm
by m3rajk
yes.

Code: Select all

function parts($html_tag){
  $pass1=preg_split('/ /',$html_tag, -1, PREG_SPLIT_NO_EMPTY);
  $pass2=array();
  foreach ($pass1 as $pass){
    $pass2[]=preg_split('/=/',$pass, -1, PREG_SPLIT_NO_EMPTY);
  }
  foreach($pass2 as $pass){
    $pass[1]=preg_replace('/''|"/','',$pass[1]);
  }
  return $pass2;
}

Posted: Tue Oct 28, 2003 11:08 pm
by m3rajk
if you're unsure of what that does, first check the functions on php.net, then ask here. everyone that knows regexp will probalby help. instead of giving you the thing i had varied to suit your needs i modified what you did since that's obvuiosly going to be something you'll pick up faster

Posted: Wed Oct 29, 2003 9:12 am
by marcoBR
My attempt based on sweatje's code:

Code: Select all

$str = '<user_data name="Ben" email="ben@ben" another="test of '' in string" test=''different quote'' another_test=no quotes />'; 
preg_match_all('/( (\w+) = ([''"]*?) ([^\\3]*) \\3 )/Ux', $str, $match); 

$arr = array(); 
for($i=0,$j=count($match[2]); $i<$j; $i++) { 
    $arr[$match[2][$i]] = $match[4][$i]; 
} 

echo '<pre>';
print_r($arr);
echo '</pre>';
results:

Code: Select all

Array
(
    &#1111;name] =&gt; Ben
    &#1111;email] =&gt; ben@ben
    &#1111;another] =&gt; test of ' in string
    &#1111;test] =&gt; different quote
    &#1111;another_test] =&gt; 
)
Now the problem are tags that haven't quotes enclosing attribute's value, the value is missing... see another_test tag as example. Any ideas to solve???

I'm thinking about format the tag properly before use it... In this case I'll need a pattern to do it. e.g.: <user_data test="first_test" another_test=last_test/> after format: <user_data test="first_test" another_test="last_test"/> ... it seems impossible to do it using only a pattern, no?!??!..

Posted: Wed Oct 29, 2003 2:07 pm
by m3rajk
did you try the variationi gave you yet?

do you mind if it returns a multi-dimensional array?

Posted: Wed Oct 29, 2003 2:21 pm
by Cruzado_Mainfrm
it will be a little hard to get another_test=no quotes <-- part in bold, because it has no aparent ending, it has a start ("), but no ending... leaving the attribute open to any other attribute, thus eating any other strings..., but this can be achieved if the value after the = is a single word, no spaces

Posted: Wed Oct 29, 2003 5:45 pm
by marcoBR
m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
Cruzado_Mainfrm wrote:...but this can be achieved if the value after the = is a single word, no spaces?
No problem, please show me the pattern to achieve it.

Posted: Fri Oct 31, 2003 7:44 pm
by m3rajk
marcoBR wrote:
m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
does it get it to what you want? i didn't really think completely through on purpose...aand i think i know the issue... so if you take my idea and then find the forrect regexps to make it work ....


like i said, you're here to learn, so i don't reallllllly wanna give the exact csolution even if i know it. i think i know some ones that will help you, so i'll give them to you and let you decide which to use and how to use them


'/<([^>])>/' <-- gets everythin in a html tag

'/(\w+=".+")/' <--- gets each thing (however blah = "blah blah" wont work. for that you need a slight alteration: '/(\w+\s+=\s+".+")/' yet that requires " still. to not care about the " : '/(\w+\s+=.+^(\w+\s+=))/'

if you play with () you can get the initial one to find what's inthe <> to give you an array that's everything you want....