back reference as a named capture

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

back reference as a named capture

Post by SidewinderX »

Code: Select all

<?php
 
$string = "123456-string";
preg_match("/^(.*?):(.*?)$/", $string, $matches);
 
print_r($matches);
 
?>
$matches will obviously contain:

Code: Select all

Array
(
    [0] => 123456:string
    [1] => 123456
    [2] => string
)


My desired result is simply:

Code: Select all

Array
(
    [string] => 123456
)


After I get the matches I can easily do this with some array functions, but I would rather not go through the extra steps if I can do it all with preg_match. So I have two questions.

1. If I use a named capture, can I avoid the duplicate results?

Code: Select all

/^(?P<name>.*?):(.*?)$/
RESULTS IN:
Array
(
    [0] => 123456:string
    [name] => 123456
    [1] => 123456
    [2] => string
)
I WOULD LIKE:
Array
(
    [0] => 123456:string
    [name] => 123456
    [2] => string
)
2. Is it possible, and if so how can I use a back reference as a named capture? The following is wrong but demonstrates what I am after:

Code: Select all

preg_match("/^(?P<$1>.*?):(.*?)$/", $string, $matches);
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: back reference as a named capture

Post by GeertDD »

1. Three regexes that only match the desired part, e.g. "123456". Comments in the code.

Code: Select all

// Your original regex which matches too much, and is the slowest of all four.
preg_match('/^(.*?):(.*?)$/', $string, $matches); // 0,37s (for 100,000 loops)
 
// The key is to get rid of all needless capturing parentheses
preg_match('/^.*?(?=:.*?$)/', $string, $matches); // 0,29s
 
// Optimization #1: using a possessive negative character class instead of a lazy dot
preg_match('/^[^:]*+(?=:.*?$)/', $string, $matches); // 0,24s
 
// Optimization #2: stop bothering about whatever comes after the colon
preg_match('/^[^:]*+(?=:)/', $string, $matches); // 0,20s
2. Backreferences can't be used as name for the captures, but you don't need them anymore now, do you?
Post Reply