[solved]What should I use?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

[solved]What should I use?

Post by phpwalker »

Hi, I'm new in regex, I'ev a very simple question to ask here.

How do I change this
<age> 18 </age>
into this

age: 18
I can do the reverse one, but I couldn't figure out how to do the above one.

Thanks in advance if anyone can teach me.
Last edited by phpwalker on Fri Feb 09, 2007 2:20 am, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Would the pattern need to deal with nesting?
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Would the pattern need to deal with nesting?
No, currently don't have to deal with nesting problem. I think that would leave it to the future as I am still new in this regex. :?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

So you don't have containers wrapping around all these elements you're wishing to extract?
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Ah. I see the problem now.

<XML header>

<doc>

<name>phpwalker</name>
<address>abc 124 washington </address>

</doc>

How to extract all contents out from the doc and make the pattern become

Code: Select all

name: phpwalker
address: abc 124 washington
Just now from the other board, I've tried the XML DOM and simpleXML. Both are working well now, just that I wanted to learn regex way of doing that. Thanks feyd if you could help me.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If your containers are built like that, I would probably use preg_match_all() to capture tags that are confined to single lines, then preg_replace() those results into what you're wishing for output.

I'd like to see you attempt some things before I start doling out answers.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

I've tried to do this.

The XML:

Code: Select all

<?xml version="1.0"?>
<pet>
    <name>Polly Parrot</name>
    <age>3</age>
    <species>parrot</species>
    <parents>
        <mother>Pia Parrot</mother>
        <father>Peter Parrot</father>
    </parents>
</pet>
The php:

Code: Select all

<?php

$file = 'data.xml';

// read file into array
$data = file($file) or die('Could not read file!');

// loop through array and print each line
foreach ($data as $line) {

    if (preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $line, $out, PREG_PATTERN_ORDER)) {

            echo "it matches.<br/>";

			echo $out[1][0] . ": " . $out[1][1] . "\n";


         } else {
             echo "it doesn't match.<br/>";
         }

}

?>
The result:
it doesn't match.
it doesn't match.
it matches.
Polly Parrot: it matches.
3: it matches.
parrot: it doesn't match.
it matches.
Pia Parrot: it matches.
Peter Parrot: it doesn't match.
it doesn't match.
The above example is what I get from the PHP.net manual.

If don't look at the not matching element. the result is:
Polly Parrot:
3:
parrot:
Pia Parrot:
Peter Parrot:
I'm still figuring the

Code: Select all

(preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $line, $out, PREG_PATTERN_ORDER))
I don't know how it works... so confusing here... fyed, can you explain to me what is each of the metacharacter use for? I never see the U. I searched the php.net manual and found
This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) modifier setting within the pattern or by a question mark behind a quantifier (e.g. .*?).
I cant understand...
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

|<[^>]+>(.*)</[^>]+>|U

| - the starting delimiter. It is rarely recommended to be a metacharacter, but in this case is fine.
< - just the character, nothing special
[^..] - negative character class construct. It will match any character that is not contained in the square brackets.
+ - one or more. Since the "U" modifier is specified, it will find the shortest possible match -- ungreedy --. Normally, it will find the longest possible match -- greedy --.
> - just the character, nothing special
(..) - a group construct. Requests the regular expression engine to remember the contents. Each is numbered in order of appearance. \0 or $0 is the entire match, followed by \1 or $1 for the first group and so forth.
</ - the characters, nothing special
[^..] - negative character class construct. It will match any character that is not contained in the square brackets.
+ - one or more. Since the "U" modifier is specified, it will find the shortest possible match -- ungreedy --. Normally, it will find the longest possible match -- greedy --.
> - just the character, nothing special
| - the ending delimiter.
U - the ungreedy specifier. It negates the greediness of "*" and "+" so they match the shortest possible span to fit the pattern instead of needing to be followed by "?" to perform the same. Now "?" will make them greedy.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Thanks feyd for spending your time. But I haven't got the solution yet.

Now what I get is
<name> phpwalker </name>
become
phpwalker:
but I wanted to make it
name: phpwalker
I really new in regular expression. Still a lot I don't know.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Your pattern isn't capturing the tag name component. Look carefully through the breakdown notes I posted previously for some hints.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Ehm, one more thing I don't understand is how the string stored into $out[] array?

Code: Select all

echo $out[1][0] . ": " . $out[1][1] . "\n";
How is the preg_match_all() stored the matched pattern's string into the $out array? Isn't it a multi-dimensional array? Can briefly explain a bit to me? Thanks.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Yes, it's multi-dimensional. Use print_r() to view the structure.
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Oh, there is print_r(). I always forget this and seldom use this.
Thanks again for reminding me.

And the result is:
Array
(
[0] => Array
(
[0] => Polly Parrot
)

[1] => Array
(
[0] => Polly Parrot
)

)

Array
(
[0] => Array
(
[0] => 3
)

[1] => Array
(
[0] => 3
)

)

Array
(
[0] => Array
(
[0] => parrot
)

[1] => Array
(
[0] => parrot
)

)

Array
(
[0] => Array
(
[0] => Pia Parrot
)

[1] => Array
(
[0] => Pia Parrot
)

)

Array
(
[0] => Array
(
[0] => Peter Parrot
)

[1] => Array
(
[0] => Peter Parrot
)

)
phpwalker
Forum Commoner
Posts: 81
Joined: Sun Apr 23, 2006 12:18 pm

Post by phpwalker »

Oops, that's the result shown in html. When I view source, it's the real result here.
<pre>Array
(
[0] => Array
(
[0] => <name>Polly Parrot</name>
)

[1] => Array
(
[0] => Polly Parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <age>3</age>
)

[1] => Array
(
[0] => 3
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <species>parrot</species>
)

[1] => Array
(
[0] => parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <mother>Pia Parrot</mother>

)

[1] => Array
(
[0] => Pia Parrot
)

)
</pre><pre>Array
(
[0] => Array
(
[0] => <father>Peter Parrot</father>
)

[1] => Array
(
[0] => Peter Parrot
)

)
</pre>
Ehm, I know what should I do now! Thanks feyd! I learn a lot from you.
Post Reply