Page 1 of 1
Regex for a list - grab all lines until...
Posted: Thu Sep 03, 2009 1:16 pm
by curb
I have a list in the format below
Code: Select all
-------Sep 02-------
http://sitesep2.com/
http://sitesep21.com/
http://sitesep31223.com/
http://randomurl.com
-------Sep 03-------
http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com
How would I grab all the URLS from the last occuring "-------DATE-------" like the example below (without the Sep 03 line):
Code: Select all
http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com
Re: Regex for a list - grab all lines until...
Posted: Thu Sep 03, 2009 9:25 pm
by ridgerunner
This ought to do the trick...
Code: Select all
// short version
$re = '/(?:(?<!-------[A-Z][a-z][a-z] \d\d-------).)*+$/s';
// long commented version
$re = '/ # free-spacing mode regex to match all data after last ---date--- line
(?: # non-capture group for star quantifier
(?<! # at a position that does not follow
-------[A-Z][a-z][a-z][ ]\d\d------- # <-this,
). # match any one character
)*+ # and do this any number of times
$ # up until the end of the string
/sx';
if (preg_match($re, $text, $matches)) {
$result = $matches[0];
} else {
$result = "";
}
Edit 2009-09-06: This regex has a minor problem - see following post for explanation and fix...
Re: Regex for a list - grab all lines until...
Posted: Sun Sep 06, 2009 10:04 am
by ridgerunner
Actually, the regex in my previous post has a problem when the file has "\r\n" line terminations. It matches a "\n" as the first character. The following regex fixes this problem by explicitly matching the first character following the date line.
Code: Select all
// short version
$re = '/(?<=-------[A-Z][a-z][a-z] \d\d-------\r\n).(?:(?<!-------[A-Z][a-z][a-z] \d\d-------\r\n).)*+$/s';
// long commented version
$re = '/
# free-spacing mode regex to match all data after last ---date--- line
(?<= # at a position following
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n # <-this
). # grab the first character
(?: # then start group to grab the rest
(?<! # at a position that does not follow
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n # <-this,
). # match any one character
)*+ # and do this any number of times
$ # up until the end of the string
/sx';
if (preg_match($re, $text, $matches)) {
$result = $matches[0];
} else {
$result = "";
}