Retrieve information over multiple lines.

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Retrieve information over multiple lines.

Post by social_experiment »

I retrieve the following information from a webpage:

Code: Select all

<UL class=list-rr><!--cold/hot/windy-->
  <LI><B>Today's coldest towns</B>:<BR>Ficksburg: -12°C<BR>Fouriesburg: 
  -10°C<BR>Ladybrand: -9°C<BR>
  <LI><B>Today's warmest towns</B>:<BR>Levubu: 22°C<BR>Giyani: 22°C<BR>Upington: 
  23°C<BR>

  <LI class=last><B>Today's Windiest towns</B>:<BR>Bergville: 41km/hr<BR>Cape 
  St. Francis: 31km/hr<BR>Sodwana Bay: 28km/hr<BR></LI></UL>
Using the regular expression below i extract the following information:

Code: Select all

<?php

$pattern = "/";
$pattern .= "(\w+|\w+\s\w+|\w+\s\w+\s\w+|\w+\s\w+\W\s\w+): ";
$pattern .= "(\W\d+°C|\d+°C|\d+km\/hr)";
$pattern .= "/";

/*
[0] => Array
        (
            [0] => Ficksburg: -12°C
            [1] => Ladybrand: -9°C
            [2] => Levubu: 22°C
            [3] => Giyani: 22°C
            [4] => Bergville: 41km/hr
            [5] => Francis: 31km/hr
            [6] => Sodwana Bay: 28km/hr
        )
*/
?>
The problem is i cannot seem to extract the information that is spread across more than one line (Fouriesburg: <newline>-10°C, etc). Any assistance would be appreciated.
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
Corvin
Forum Commoner
Posts: 49
Joined: Sun Dec 03, 2006 1:04 pm

Re: Retrieve information over multiple lines.

Post by Corvin »

Use the dot metacharacter and the modifier "s".

:arrow: http://www.php.net/manual/en/reference. ... ifiers.php
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded.
User avatar
AbraCadaver
DevNet Master
Posts: 2572
Joined: Mon Feb 24, 2003 10:12 am
Location: The Republic of Texas
Contact:

Re: Retrieve information over multiple lines.

Post by AbraCadaver »

You could try adding whitespace character to the end:

Code: Select all

$pattern .= "(\w+|\w+\s\w+|\w+\s\w+\s\w+|\w+\s\w+\W\s\w+):[\s]+";
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
Post Reply