Page 1 of 1
Regex pattern returning html tag attributes as array
Posted: Mon Oct 27, 2003 9:59 pm
by marcoBR
Is it possible to write a pattern to match a html tag with indeterminate number of attributes and return these attributes as array with attribute name as key and value as value???
e.g.: <user_data name="Ben" email="ben@ben" ... /> will return $user_data = array('name'=>'Ben', 'email'=>'ben@ben');
Code: Select all
$tag = '<user_data name="Ben" email="ben@ben" />';
preg_match('/<user_data\s+((.*)=(.*))*?\s*\/>/', $tag, $attribs);
echo '<pre>';
print_r($attribs);
echo '</pre>';
I have tried the code above, but it's not working as intended.
If it'is impossible to achieve above array using only a single pattern, the array below also is accepted, thus I can use it to build an array as intended.
Code: Select all
Array
(
ї0] => 'name'
ї1] => 'Ben'
ї2] => 'email'
ї3] => 'ben@ben'
)
Another any approach are welcome too...
Posted: Mon Oct 27, 2003 11:10 pm
by m3rajk
i built a funtion for something rather similar using the
preg_split() function.
if you already have an attempt to do what you want with that, post it and i'll give you hints till you've succeded so you'll learn it. otherwise i suggest reading the link, then asking any questions necessary for clarification, because you can build a function to do it using preg_split.
Posted: Tue Oct 28, 2003 11:58 am
by marcoBR
Thank you for your hint m3rajk, now I got it working...
Code: Select all
$attribs = 'name="Ben" email="ben@ben"';
$attribs_parts = preg_split('/[=|"|''+(.*?)"|''+]/is', $attribs, -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>';
print_r($attribs_parts);
echo '</pre>';
/*Results
Array (
[0] => name
[1] => Ben
[2] => email
[3] => ben@ben
)
*/
Two new problems now:
1) If user omit quotes(" or ') to enclose the attribute value, it isn't work properly. e.g.: name=Ben email="ben@ben"
2) I user use any delimiter in the attribute value(=," or '), it isn't work properly as well. e.g.: name="Ben's Jr." email="ben@ben" compare="a=b"
Any trick to solve these problems???
Posted: Tue Oct 28, 2003 11:07 pm
by m3rajk
yes.
Code: Select all
function parts($html_tag){
$pass1=preg_split('/ /',$html_tag, -1, PREG_SPLIT_NO_EMPTY);
$pass2=array();
foreach ($pass1 as $pass){
$pass2[]=preg_split('/=/',$pass, -1, PREG_SPLIT_NO_EMPTY);
}
foreach($pass2 as $pass){
$pass[1]=preg_replace('/''|"/','',$pass[1]);
}
return $pass2;
}
Posted: Tue Oct 28, 2003 11:08 pm
by m3rajk
if you're unsure of what that does, first check the functions on php.net, then ask here. everyone that knows regexp will probalby help. instead of giving you the thing i had varied to suit your needs i modified what you did since that's obvuiosly going to be something you'll pick up faster
Posted: Wed Oct 29, 2003 9:12 am
by marcoBR
My attempt based on sweatje's code:
Code: Select all
$str = '<user_data name="Ben" email="ben@ben" another="test of '' in string" test=''different quote'' another_test=no quotes />';
preg_match_all('/( (\w+) = ([''"]*?) ([^\\3]*) \\3 )/Ux', $str, $match);
$arr = array();
for($i=0,$j=count($match[2]); $i<$j; $i++) {
$arr[$match[2][$i]] = $match[4][$i];
}
echo '<pre>';
print_r($arr);
echo '</pre>';
results:
Code: Select all
Array
(
їname] => Ben
їemail] => ben@ben
їanother] => test of ' in string
їtest] => different quote
їanother_test] =>
)
Now the problem are tags that haven't quotes enclosing attribute's value, the value is missing... see another_test tag as example. Any ideas to solve???
I'm thinking about format the tag properly before use it... In this case I'll need a pattern to do it. e.g.: <user_data test="first_test" another_test=last_test/> after format: <user_data test="first_test" another_test="last_test"/> ... it seems impossible to do it using only a pattern, no?!??!..
Posted: Wed Oct 29, 2003 2:07 pm
by m3rajk
did you try the variationi gave you yet?
do you mind if it returns a multi-dimensional array?
Posted: Wed Oct 29, 2003 2:21 pm
by Cruzado_Mainfrm
it will be a little hard to get another_test=no quotes <-- part in bold, because it has no aparent ending, it has a start ("), but no ending... leaving the attribute open to any other attribute, thus eating any other strings..., but this can be achieved if the value after the = is a single word, no spaces
Posted: Wed Oct 29, 2003 5:45 pm
by marcoBR
m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
Cruzado_Mainfrm wrote:...but this can be achieved if the value after the = is a single word, no spaces?
No problem, please show me the pattern to achieve it.
Posted: Fri Oct 31, 2003 7:44 pm
by m3rajk
marcoBR wrote:m3rajk wrote:did you try the variationi gave you yet?
Yes, I did... but it didn't return intended array.
m3rajk wrote:do you mind if it returns a multi-dimensional array?
Yes, but not in the intendend format.
does it get it to what you want? i didn't really think completely through on purpose...aand i think i know the issue... so if you take my idea and then find the forrect regexps to make it work ....
like i said, you're here to learn, so i don't reallllllly wanna give the exact csolution even if i know it. i think i know some ones that will help you, so i'll give them to you and let you decide which to use and how to use them
'/<([^>])>/' <-- gets everythin in a html tag
'/(\w+=".+")/' <--- gets each thing (however blah = "blah blah" wont work. for that you need a slight alteration: '/(\w+\s+=\s+".+")/' yet that requires " still. to not care about the " : '/(\w+\s+=.+^(\w+\s+=))/'
if you play with () you can get the initial one to find what's inthe <> to give you an array that's everything you want....