Page 1 of 1

Retrieve information over multiple lines.

Posted: Thu Jul 21, 2011 9:28 am
by social_experiment
I retrieve the following information from a webpage:

Code: Select all

<UL class=list-rr><!--cold/hot/windy-->
  <LI><B>Today's coldest towns</B>:<BR>Ficksburg: -12°C<BR>Fouriesburg: 
  -10°C<BR>Ladybrand: -9°C<BR>
  <LI><B>Today's warmest towns</B>:<BR>Levubu: 22°C<BR>Giyani: 22°C<BR>Upington: 
  23°C<BR>

  <LI class=last><B>Today's Windiest towns</B>:<BR>Bergville: 41km/hr<BR>Cape 
  St. Francis: 31km/hr<BR>Sodwana Bay: 28km/hr<BR></LI></UL>
Using the regular expression below i extract the following information:

Code: Select all

<?php

$pattern = "/";
$pattern .= "(\w+|\w+\s\w+|\w+\s\w+\s\w+|\w+\s\w+\W\s\w+): ";
$pattern .= "(\W\d+°C|\d+°C|\d+km\/hr)";
$pattern .= "/";

/*
[0] => Array
        (
            [0] => Ficksburg: -12°C
            [1] => Ladybrand: -9°C
            [2] => Levubu: 22°C
            [3] => Giyani: 22°C
            [4] => Bergville: 41km/hr
            [5] => Francis: 31km/hr
            [6] => Sodwana Bay: 28km/hr
        )
*/
?>
The problem is i cannot seem to extract the information that is spread across more than one line (Fouriesburg: <newline>-10°C, etc). Any assistance would be appreciated.

Re: Retrieve information over multiple lines.

Posted: Thu Jul 21, 2011 9:45 am
by Corvin
Use the dot metacharacter and the modifier "s".

:arrow: http://www.php.net/manual/en/reference. ... ifiers.php
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded.

Re: Retrieve information over multiple lines.

Posted: Thu Jul 21, 2011 10:00 am
by AbraCadaver
You could try adding whitespace character to the end:

Code: Select all

$pattern .= "(\w+|\w+\s\w+|\w+\s\w+\s\w+|\w+\s\w+\W\s\w+):[\s]+";