Page 1 of 1

Regex for a list - grab all lines until...

Posted: Thu Sep 03, 2009 1:16 pm
by curb
I have a list in the format below

Code: Select all

-------Sep 02-------
http://sitesep2.com/
http://sitesep21.com/
http://sitesep31223.com/
http://randomurl.com
-------Sep 03-------
http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com
How would I grab all the URLS from the last occuring "-------DATE-------" like the example below (without the Sep 03 line):

Code: Select all

http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com

Re: Regex for a list - grab all lines until...

Posted: Thu Sep 03, 2009 9:25 pm
by ridgerunner
This ought to do the trick...

Code: Select all

// short version
$re = '/(?:(?<!-------[A-Z][a-z][a-z] \d\d-------).)*+$/s';
 
// long commented version
$re = '/ # free-spacing mode regex to match all data after last ---date--- line
(?:                                       # non-capture group for star quantifier
  (?<!                                    # at a position that does not follow
    -------[A-Z][a-z][a-z][ ]\d\d-------  # <-this,
  ).                                      # match any one character
)*+                                       # and do this any number of times
$                                         # up until the end of the string
/sx';
 
if (preg_match($re, $text, $matches)) {
    $result = $matches[0];
} else {
    $result = "";
}
Edit 2009-09-06: This regex has a minor problem - see following post for explanation and fix...

Re: Regex for a list - grab all lines until...

Posted: Sun Sep 06, 2009 10:04 am
by ridgerunner
Actually, the regex in my previous post has a problem when the file has "\r\n" line terminations. It matches a "\n" as the first character. The following regex fixes this problem by explicitly matching the first character following the date line.

Code: Select all

// short version
$re = '/(?<=-------[A-Z][a-z][a-z] \d\d-------\r\n).(?:(?<!-------[A-Z][a-z][a-z] \d\d-------\r\n).)*+$/s';
 
// long commented version
$re = '/
# free-spacing mode regex to match all data after last ---date--- line
(?<=                                      # at a position following
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n  # <-this
).                                        # grab the first character
(?:                                       # then start group to grab the rest
  (?<!                                    # at a position that does not follow
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n  # <-this,
  ).                                      # match any one character
)*+                                       # and do this any number of times
$                                         # up until the end of the string
/sx';
 
if (preg_match($re, $text, $matches)) {
    $result = $matches[0];
} else {
    $result = "";
}