need help finding repetitive enclosed patterns

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
nameuser
Forum Newbie
Posts: 1
Joined: Sat Nov 13, 2010 1:28 pm

need help finding repetitive enclosed patterns

Post by nameuser »

Hi all,

The search string I have looks like this:

"XXXX !M 123 J : 3am 4am !T 124 N : 3am 4am !F 125 D : 3am 4am 7am 6am XXXX !M 223 M : 2am 3am 7am 6am !T 224 J : 4am !S 225 O : 3am 4am XXXX !M 323 A : 6am !S 324 J : 7am !W 325 F : 3am 7am 6am "

So there is an unknown number of big blocks starting with XXXX, and there is unknown number of smaller blocks inside each big one starting with "!"

Each small block starts with a letter, one out of certain known set, followed by some digits, then another letter from a known set, semicolon and some time data of an unknown length.

I'd like to write a regexp for preg_match_all, which will extract time data into a two dimensional indexed array:

the first dimension follows big blocks, the second - small blocks, and array elements hold the time data, like a[0][0] = "3am 4am ", a[0][2] = "3am 4am 7am 6am" ... a[2][2] = "3am 7am 6am ".

Thanks!
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: need help finding repetitive enclosed patterns

Post by ridgerunner »

This script does what you are asking:

Code: Select all

<?php // test.php 2010-11-15
// fully commented regular expressions:
$re_outer = '/# Outer block regex
    \bXXXX\b        # match beginning of block marker
    (               # capture this block data into group $1
      .*?           # lazily match everything up to block end
    )               # end capture group $1
    (?:             # begin group of "end-of-block" alternatives
      (?=\bXXXX\b)  # end is either start of next block
    |               # or...
      $             # end of string
    )               # end group of "end-of-block" alternatives
    /ix';
$re_inner = '/# Inner block regex
    [A-Z]\s+       # a letter, one out of certain known set
    \d+\s+         # followed by some digits
    [A-Z]\s+       # then another letter from a known set
    [;:]\s+        # semicolon (or is it a colon?) and...
    (              # capture one or more time data into group $1
      \d+[ap]m     # first time data in this block (can be am or pm)
      (?:          # non-capture group for additional time data
        \s+        # each time data separated by some whitespace
        \d+[ap]m   # next time data in this block
      )*           # can have zero or more additional time data
    )              # end capture group $1
    /ix';

// Test data:
$data = "XXXX !M 123 J : 3am 4am !T 124 N : 3am 4am !F 125 D : 3am 4am 7am 6am XXXX !M 223 M : 2am 3am 7am 6am !T 224 J : 4am !S 225 O : 3am 4am XXXX !M 323 A : 6am !S 324 J : 7am !W 325 F : 3am 7am 6am";

// Now build 2-dimentional array conating time data
$blk_cnt = preg_match_all($re_outer, $data, $outer_matches);
$time_data = array();  // initialize result data array
for ($i = 0; $i < $blk_cnt; $i++) {
    $td_cnt = preg_match_all($re_inner, $outer_matches[1][$i], $inner_matches);
    if ($td_cnt > 0)
        $time_data[$i] = $inner_matches[1];
    else
        $time_data[$i] = array();
}
print_r($time_data);
?>
Given your test data, here is what the resulting array looks like:
[text]Array
(
[0] => Array
(
[0] => 3am 4am
[1] => 3am 4am
[2] => 3am 4am 7am 6am
)

[1] => Array
(
[0] => 2am 3am 7am 6am
[1] => 4am
[2] => 3am 4am
)

[2] => Array
(
[0] => 6am
[1] => 7am
[2] => 3am 7am 6am
)

)[/text]

Note that your test data has colons where your text description says semi-colons. The script works for both. Hope this helps.
:)
Post Reply