Regex pattern returning html tag attributes as array

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
marcoBR
Forum Newbie
Posts: 4
Joined: Mon Oct 27, 2003 9:59 pm

Regex pattern returning html tag attributes as array

Post by marcoBR »

Is it possible to write a pattern to match a html tag with indeterminate number of attributes and return these attributes as array with attribute name as key and value as value???

e.g.: <user_data name="Ben" email="ben@ben" ... /> will return $user_data = array('name'=>'Ben', 'email'=>'ben@ben');

Code: Select all

$tag = '<user_data name="Ben" email="ben@ben" />';
preg_match('/<user_data\s+((.*)=(.*))*?\s*\/>/', $tag, $attribs);
echo '<pre>';
print_r($attribs);
echo '</pre>';
I have tried the code above, but it's not working as intended.

If it'is impossible to achieve above array using only a single pattern, the array below also is accepted, thus I can use it to build an array as intended.

Code: Select all

Array
(
    &#1111;0] =&gt; 'name'  
    &#1111;1] =&gt; 'Ben'
    &#1111;2] =&gt; 'email'
    &#1111;3] =&gt; 'ben@ben' 
)
Another any approach are welcome too...
Last edited by marcoBR on Wed Oct 29, 2003 5:47 pm, edited 1 time in total.
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

i built a funtion for something rather similar using the preg_split() function.

if you already have an attempt to do what you want with that, post it and i'll give you hints till you've succeded so you'll learn it. otherwise i suggest reading the link, then asking any questions necessary for clarification, because you can build a function to do it using preg_split.
marcoBR
Forum Newbie
Posts: 4
Joined: Mon Oct 27, 2003 9:59 pm

Post by marcoBR »

Thank you for your hint m3rajk, now I got it working...

Code: Select all

$attribs = 'name="Ben" email="ben@ben"';
$attribs_parts = preg_split('/[=|"|''+(.*?)"|''+]/is', $attribs, -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>'; 
print_r($attribs_parts); 
echo '</pre>';
/*Results
Array ( 
  [0] => name 
  [1] => Ben 
  [2] => email 
  [3] => ben@ben
)
*/
Two new problems now:

1) If user omit quotes(" or ') to enclose the attribute value, it isn't work properly. e.g.: name=Ben email="ben@ben"

2) I user use any delimiter in the attribute value(=," or '), it isn't work properly as well. e.g.: name="Ben's Jr." email="ben@ben" compare="a=b"

Any trick to solve these problems???
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

yes.

Code: Select all

function parts($html_tag){
  $pass1=preg_split('/ /',$html_tag, -1, PREG_SPLIT_NO_EMPTY);
  $pass2=array();
  foreach ($pass1 as $pass){
    $pass2[]=preg_split('/=/',$pass, -1, PREG_SPLIT_NO_EMPTY);
  }
  foreach($pass2 as $pass){
    $pass[1]=preg_replace('/''|"/','',$pass[1]);
  }
  return $pass2;
}
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

if you're unsure of what that does, first check the functions on php.net, then ask here. everyone that knows regexp will probalby help. instead of giving you the thing i had varied to suit your needs i modified what you did since that's obvuiosly going to be something you'll pick up faster
marcoBR
Forum Newbie
Posts: 4
Joined: Mon Oct 27, 2003 9:59 pm

Post by marcoBR »

My attempt based on sweatje's code:

Code: Select all

$str = '<user_data name="Ben" email="ben@ben" another="test of '' in string" test=''different quote'' another_test=no quotes />'; 
preg_match_all('/( (\w+) = ([''"]*?) ([^\\3]*) \\3 )/Ux', $str, $match); 

$arr = array(); 
for($i=0,$j=count($match[2]); $i<$j; $i++) { 
    $arr[$match[2][$i]] = $match[4][$i]; 
} 

echo '<pre>';
print_r($arr);
echo '</pre>';
results:

Code: Select all

Array
(
    &#1111;name] =&gt; Ben
    &#1111;email] =&gt; ben@ben
    &#1111;another] =&gt; test of ' in string
    &#1111;test] =&gt; different quote
    &#1111;another_test] =&gt; 
)
Now the problem are tags that haven't quotes enclosing attribute's value, the value is missing... see another_test tag as example. Any ideas to solve???

I'm thinking about format the tag properly before use it... In this case I'll need a pattern to do it. e.g.: <user_data test="first_test" another_test=last_test/> after format: <user_data test="first_test" another_test="last_test"/> ... it seems impossible to do it using only a pattern, no?!??!..
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

did you try the variationi gave you yet?

do you mind if it returns a multi-dimensional array?
Cruzado_Mainfrm
Forum Contributor
Posts: 346
Joined: Sun Jun 15, 2003 11:22 pm
Location: Miami, FL

Post by Cruzado_Mainfrm »

it will be a little hard to get another_test=no quotes <-- part in bold, because it has no aparent ending, it has a start ("), but no ending... leaving the attribute open to any other attribute, thus eating any other strings..., but this can be achieved if the value after the = is a single word, no spaces
marcoBR
Forum Newbie
Posts: 4
Joined: Mon Oct 27, 2003 9:59 pm

Post by marcoBR »

m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
Cruzado_Mainfrm wrote:...but this can be achieved if the value after the = is a single word, no spaces?
No problem, please show me the pattern to achieve it.
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

marcoBR wrote:
m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
does it get it to what you want? i didn't really think completely through on purpose...aand i think i know the issue... so if you take my idea and then find the forrect regexps to make it work ....


like i said, you're here to learn, so i don't reallllllly wanna give the exact csolution even if i know it. i think i know some ones that will help you, so i'll give them to you and let you decide which to use and how to use them


'/<([^>])>/' <-- gets everythin in a html tag

'/(\w+=".+")/' <--- gets each thing (however blah = "blah blah" wont work. for that you need a slight alteration: '/(\w+\s+=\s+".+")/' yet that requires " still. to not care about the " : '/(\w+\s+=.+^(\w+\s+=))/'

if you play with () you can get the initial one to find what's inthe <> to give you an array that's everything you want....
Post Reply