Page 1 of 1

[solved]What should I use?

Posted: Sun Feb 04, 2007 9:47 pm
by phpwalker
Hi, I'm new in regex, I'ev a very simple question to ask here.

How do I change this
<age> 18 </age>
into this

age: 18
I can do the reverse one, but I couldn't figure out how to do the above one.

Thanks in advance if anyone can teach me.

Posted: Sun Feb 04, 2007 9:56 pm
by feyd
Would the pattern need to deal with nesting?

Posted: Sun Feb 04, 2007 10:51 pm
by phpwalker
Would the pattern need to deal with nesting?
No, currently don't have to deal with nesting problem. I think that would leave it to the future as I am still new in this regex. :?

Posted: Sun Feb 04, 2007 11:09 pm
by feyd
So you don't have containers wrapping around all these elements you're wishing to extract?

Posted: Sun Feb 04, 2007 11:37 pm
by phpwalker
Ah. I see the problem now.

<XML header>

<doc>

<name>phpwalker</name>
<address>abc 124 washington </address>

</doc>

How to extract all contents out from the doc and make the pattern become

Code: Select all

name: phpwalker
address: abc 124 washington
Just now from the other board, I've tried the XML DOM and simpleXML. Both are working well now, just that I wanted to learn regex way of doing that. Thanks feyd if you could help me.

Posted: Mon Feb 05, 2007 8:59 am
by feyd
If your containers are built like that, I would probably use preg_match_all() to capture tags that are confined to single lines, then preg_replace() those results into what you're wishing for output.

I'd like to see you attempt some things before I start doling out answers.

Posted: Tue Feb 06, 2007 11:24 pm
by phpwalker
I've tried to do this.

The XML:

Code: Select all

<?xml version="1.0"?>
<pet>
    <name>Polly Parrot</name>
    <age>3</age>
    <species>parrot</species>
    <parents>
        <mother>Pia Parrot</mother>
        <father>Peter Parrot</father>
    </parents>
</pet>
The php:

Code: Select all

<?php

$file = 'data.xml';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {

    if (preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $line, $out, PREG_PATTERN_ORDER)) {

            echo "it matches.<br/>";

			echo $out[1][0] . ": " . $out[1][1] . "\n";


         } else {
             echo "it doesn't match.<br/>";
         }

}

?>
The result:
it doesn't match.
it doesn't match.
it matches.
Polly Parrot: it matches.
3: it matches.
parrot: it doesn't match.
it matches.
Pia Parrot: it matches.
Peter Parrot: it doesn't match.
it doesn't match.
The above example is what I get from the PHP.net manual.

If don't look at the not matching element. the result is:
Polly Parrot:
3:
parrot:
Pia Parrot:
Peter Parrot:
I'm still figuring the

Code: Select all

(preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $line, $out, PREG_PATTERN_ORDER))
I don't know how it works... so confusing here... fyed, can you explain to me what is each of the metacharacter use for? I never see the U. I searched the php.net manual and found
This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) modifier setting within the pattern or by a question mark behind a quantifier (e.g. .*?).
I cant understand...

Posted: Wed Feb 07, 2007 12:24 am
by feyd
|<[^>]+>(.*)</[^>]+>|U

| - the starting delimiter. It is rarely recommended to be a metacharacter, but in this case is fine.
< - just the character, nothing special
[^..] - negative character class construct. It will match any character that is not contained in the square brackets.
+ - one or more. Since the "U" modifier is specified, it will find the shortest possible match -- ungreedy --. Normally, it will find the longest possible match -- greedy --.
> - just the character, nothing special
(..) - a group construct. Requests the regular expression engine to remember the contents. Each is numbered in order of appearance. \0 or $0 is the entire match, followed by \1 or $1 for the first group and so forth.
</ - the characters, nothing special
[^..] - negative character class construct. It will match any character that is not contained in the square brackets.
+ - one or more. Since the "U" modifier is specified, it will find the shortest possible match -- ungreedy --. Normally, it will find the longest possible match -- greedy --.
> - just the character, nothing special
| - the ending delimiter.
U - the ungreedy specifier. It negates the greediness of "*" and "+" so they match the shortest possible span to fit the pattern instead of needing to be followed by "?" to perform the same. Now "?" will make them greedy.

Posted: Wed Feb 07, 2007 11:51 am
by phpwalker
Thanks feyd for spending your time. But I haven't got the solution yet.

Now what I get is
<name> phpwalker </name>
become
phpwalker:
but I wanted to make it
name: phpwalker
I really new in regular expression. Still a lot I don't know.

Posted: Wed Feb 07, 2007 1:20 pm
by feyd
Your pattern isn't capturing the tag name component. Look carefully through the breakdown notes I posted previously for some hints.

Posted: Thu Feb 08, 2007 3:27 pm
by phpwalker
Ehm, one more thing I don't understand is how the string stored into $out[] array?

Code: Select all

echo $out[1][0] . ": " . $out[1][1] . "\n";
How is the preg_match_all() stored the matched pattern's string into the $out array? Isn't it a multi-dimensional array? Can briefly explain a bit to me? Thanks.

Posted: Thu Feb 08, 2007 4:40 pm
by feyd
Yes, it's multi-dimensional. Use print_r() to view the structure.

Posted: Fri Feb 09, 2007 1:53 am
by phpwalker
Oh, there is print_r(). I always forget this and seldom use this.
Thanks again for reminding me.

And the result is:
Array
(
[0] => Array
(
[0] => Polly Parrot
)

[1] => Array
(
[0] => Polly Parrot
)

)

Array
(
[0] => Array
(
[0] => 3
)

[1] => Array
(
[0] => 3
)

)

Array
(
[0] => Array
(
[0] => parrot
)

[1] => Array
(
[0] => parrot
)

)

Array
(
[0] => Array
(
[0] => Pia Parrot
)

[1] => Array
(
[0] => Pia Parrot
)

)

Array
(
[0] => Array
(
[0] => Peter Parrot
)

[1] => Array
(
[0] => Peter Parrot
)

)

Posted: Fri Feb 09, 2007 2:19 am
by phpwalker
Oops, that's the result shown in html. When I view source, it's the real result here.
<pre>Array
(
[0] => Array
(
[0] => <name>Polly Parrot</name>
)

[1] => Array
(
[0] => Polly Parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <age>3</age>
)

[1] => Array
(
[0] => 3
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <species>parrot</species>
)

[1] => Array
(
[0] => parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <mother>Pia Parrot</mother>

)

[1] => Array
(
[0] => Pia Parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <father>Peter Parrot</father>
)

[1] => Array
(
[0] => Peter Parrot
)

)
</pre>
Ehm, I know what should I do now! Thanks feyd! I learn a lot from you.