Match the attributes of an html tag?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Match the attributes of an html tag?

Post by zenhop »

Hello,
I'm now trying to put the attributes of a tag in an array.
I can already extract the arguments out of a tag as a single string.

Here is an example of string:
title="Hello" arg2="blah" number=3

So I'm trying this:

Code: Select all

$tagArgs = "title="Hello" arg2="blah" number=3";
("#([a-z]+)(\s*)=\"([a-z]+)\"#mis",$tagArgs,$args);
But it's not working...
Anybody knowing how to do that?
thx! :)
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Re: Match the attributes of an html tag?

Post by Burrito »

if you use preg_match() it will put the matches into an array for you.

if you need that array restructured, just loop over it and build a new one.
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

I'm already using preg_match. I did the copy-paste wrong lol

Code: Select all

$tagArgs = "title="Hello" arg2="blah" number=3";
preg_match("#([a-z]+)(\s*)=\"([a-z]+)\"#mis",$tagArgs,$args);
But it does not give me the vars and values... It's acting weird, and I can't figure out what's wrong with my regex.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Re: Match the attributes of an html tag?

Post by Burrito »

you'll probably want to use preg_match_all() to keep searching your string after your last match is found.

in your regex itself, you're only matching alpha chars (a-z) but your args and values have numbers in them. You need to check for those as well.
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

Hum, yeah, working better :)
thx
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

Ok, I'm really stuck.

Here is my regex:

Code: Select all

#([a-z0-9]+)\=\"?(.*?)\"?#mis
Full php code:

Code: Select all

$tagArgs = "title=\"Hello\" arg2=\"blah\" number=3";
preg_match_all("#([a-z0-9]+)\=\"?(.*?)\"?#mis",$tagArgs,$args);
print_r($args);
Output:

Code: Select all

Array
(
    [0] => Array
        (
            [0] => title="
            [1] => arg2="
            [2] => number=
        )
 
    [1] => Array
        (
            [0] => title
            [1] => arg2
            [2] => number
        )
 
    [2] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )
)
I can get the vars, but not the values.
I tried many many regex, but still nothing.
\=\"?(.*?)\"? is supposed to extract what's after "=" with or without quotes, right?
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Re: Match the attributes of an html tag?

Post by Burrito »

Code: Select all

 
<?php
$tagArgs = "title=\"Hello\" arg2=\"blah\" number=\"3\"";
preg_match_all("#([a-z1-9]+)\s*=\"([a-z1-9]+)\"#mis",$tagArgs,$args);
echo "<pre>";
print_r($args);
echo "</pre>";
?>
 
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

Thanx, working here, but what if I want more than just a-z0-9 as a value?
I will have some commas and special chars sometime... like javascript code, or accents.
I still don't understand why it's not working with (.*?)
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Re: Match the attributes of an html tag?

Post by Burrito »

what doesn't work with it?

Code: Select all

 
<?php
$tagArgs = "title=\"Hello\" arg2=\"blah\" number=\"3\"";
preg_match_all("#([a-z1-9]+)\s*=\"(.*?)\"#mis",$tagArgs,$args);
echo "<pre>";
print_r($args);
echo "</pre>";
?>
 
works for me.
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

It works if quotes are mandatory, but if I put the quote as an option, it does not work anymore.
Maybe a problem in putting the quotes optionals?

My regex was #([a-z0-9]+)\=\"?(.*?)\"?#mis but not working.
It's working for only alphanumeric values: #([a-z1-9]+)\s*=\"?([a-z1-9]+)\"?#mis

So what is wrong?
Juts changing ([a-z1-9]+) for (.*?) is making the whole regex not working.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Match the attributes of an html tag?

Post by prometheuzz »

zenhop wrote:It works if quotes are mandatory, but if I put the quote as an option, it does not work anymore.
Maybe a problem in putting the quotes optionals?

My regex was #([a-z0-9]+)\=\"?(.*?)\"?#mis but not working.
It's working for only alphanumeric values: #([a-z1-9]+)\s*=\"?([a-z1-9]+)\"?#mis

So what is wrong?
Juts changing ([a-z1-9]+) for (.*?) is making the whole regex not working.
By making everything reluctant after your "=" sign (the quotes and the DOT-STAR), the regex engine will not match anything. But, I don't know if that the case with you. Your current problem description is "it does not work anymore", which is a bit vague, IMO.

Perhaps you could provide a couple of input strings that you need to match and clearly indicate which ones don't get matched by your current regex.

Good luck!
zenhop
Forum Newbie
Posts: 18
Joined: Mon Jan 19, 2009 2:18 pm

Re: Match the attributes of an html tag?

Post by zenhop »

Well, with that last regex (dot-star + optional quote) I can get the vars, but not the values. All values are empty.
The dot-star does not match anything.
My goal is just to get the vars and values of a string representing the attributes of an html tag. The vars are only alphanumerics so no problems to get them, but the values are not just alphanumerics, they can contain anything: javascript, accents, special chars...
But some of the values are not surrounded by 2 quotes, so the 2 quotes are optionals. It's the case when the value is just one word or numeric for example.

So, after the "=", I either have a complex value between quotes, or a single alphanumeric string like a word or a number.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Re: Match the attributes of an html tag?

Post by Burrito »

Code: Select all

 
$tagArgs = "title=\"Hello\" arg2=blah number=\"3\"";
preg_match_all('#([a-z0-9]+)="?(.*?)["|\s]+#mis',$tagArgs,$args);
echo "<pre>";
print_r($args);
echo "</pre>";
 
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Match the attributes of an html tag?

Post by prometheuzz »

Burrito wrote:

Code: Select all

 
$tagArgs = "title=\"Hello\" arg2=blah number=\"3\"";
preg_match_all('#([a-z0-9]+)="?(.*?)["|\s]+#mis',$tagArgs,$args);
echo "<pre>";
print_r($args);
echo "</pre>";
 
Note that inside a character class, the normal meta character, like the exclusive OR (the pipe), does not meam OR, but matches just the pipe character itself.
So, ["|\s] matches a '"', '|' or a white space character.

I guess the OP is looking for something like this:

Code: Select all

"#([a-z0-9]+)=['\"]?([^'\"\s]+)['\"]?#i" // no need for the -m and -s modifiers!
Post Reply