Regex for a list - grab all lines until...

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
curb
Forum Newbie
Posts: 1
Joined: Thu Sep 03, 2009 1:13 pm

Regex for a list - grab all lines until...

Post by curb »

I have a list in the format below

Code: Select all

-------Sep 02-------
http://sitesep2.com/
http://sitesep21.com/
http://sitesep31223.com/
http://randomurl.com
-------Sep 03-------
http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com
How would I grab all the URLS from the last occuring "-------DATE-------" like the example below (without the Sep 03 line):

Code: Select all

http://sitesep3.com/
http://sitesep21123.com/
http://sitesep23123.com/
http://randomurl32.com
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regex for a list - grab all lines until...

Post by ridgerunner »

This ought to do the trick...

Code: Select all

// short version
$re = '/(?:(?<!-------[A-Z][a-z][a-z] \d\d-------).)*+$/s';
 
// long commented version
$re = '/ # free-spacing mode regex to match all data after last ---date--- line
(?:                                       # non-capture group for star quantifier
  (?<!                                    # at a position that does not follow
    -------[A-Z][a-z][a-z][ ]\d\d-------  # <-this,
  ).                                      # match any one character
)*+                                       # and do this any number of times
$                                         # up until the end of the string
/sx';
 
if (preg_match($re, $text, $matches)) {
    $result = $matches[0];
} else {
    $result = "";
}
Edit 2009-09-06: This regex has a minor problem - see following post for explanation and fix...
Last edited by ridgerunner on Sun Sep 06, 2009 10:06 am, edited 1 time in total.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regex for a list - grab all lines until...

Post by ridgerunner »

Actually, the regex in my previous post has a problem when the file has "\r\n" line terminations. It matches a "\n" as the first character. The following regex fixes this problem by explicitly matching the first character following the date line.

Code: Select all

// short version
$re = '/(?<=-------[A-Z][a-z][a-z] \d\d-------\r\n).(?:(?<!-------[A-Z][a-z][a-z] \d\d-------\r\n).)*+$/s';
 
// long commented version
$re = '/
# free-spacing mode regex to match all data after last ---date--- line
(?<=                                      # at a position following
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n  # <-this
).                                        # grab the first character
(?:                                       # then start group to grab the rest
  (?<!                                    # at a position that does not follow
-------[A-Z][a-z][a-z][ ]\d\d-------\r\n  # <-this,
  ).                                      # match any one character
)*+                                       # and do this any number of times
$                                         # up until the end of the string
/sx';
 
if (preg_match($re, $text, $matches)) {
    $result = $matches[0];
} else {
    $result = "";
}
Post Reply