Page 1 of 1

Scaping more than one line?

Posted: Fri Oct 03, 2003 12:11 pm
by chriswheat
Hello,

I am trying to scrape more than one line from
http://travel.state.gov/travel_warnings.html

This is the chuck of stuff I want to scrape.

Code: Select all

<hr width=500 align=CENTER>
          </li>
          <li>
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><i><a href="meu_announce.html">Middle 
              East and North Africa Public Announcement (issued 9/30/03, expires 
              3/23/04)</a></i></font> </div>
          </li>
          <li> 
            <div align="center"> 
              <div align="center"> 
                <div align="center"><font face="Arial, Helvetica, sans-serif"><i><font size="2" face="Arial, Helvetica, sans-serif"><a href="wwc1.html">Worldwide 
                  Caution Public Announcement (issued 9/26/03, expires 2/25/04)</a></font> 
                  </i></font></div>
              </div>
            </div>
          </li>
          <li> 
            <div align="center"><a href="sars_notice.html"><font size="2" face="Arial, Helvetica, sans-serif"><em>Severe 
              Acute Respiratory Syndrome (SARS) Fact Sheet (issued 8/25/03)</em></font></a></div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><a href="eafrica_announce.html"><i>East 
              Africa Public Announcement (issued 9/12/03, expires 3/13/04)</i></a></font></div>
          </li>
        </ul>
        <ul>
          <li> 
            <div align="center"> 
              <div align="center"> 
                <div align="center"> 
                  <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><a href="spring_break.html"><i>Travel 
                    Safety Information for Students (February 2003)</i></a></font></div>
                </div>
              </div>
            </div>
          </li>
          <li> 
            <div align="center"> 
              <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><i><a href="behavior_modification.html">Behavior 
                Modification Facilities Fact Sheet (January 2003)</a></i></font></div>
            </div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><i><a href="cbw1.html">Chemical 
              - Biological Agents Fact sheet Update (December 2002)</a></i> </font></div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><a href="deathintro.html"><i>Information 
              on Deaths Abroad of U.S. Citizens (December 2002)</i></a> </font></div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><i><a href="fmd.html">Foot 
              and Mouth Disease Fact Sheet (September 2002)</a></i></font></div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><a href="nuclear_incidents.html"><i>Responding 
              to Radiological and Nuclear Incidents (August 2002)</i></a> </font></div>
          </li>
          <li> 
            <div align="center"><font size="2" face="Arial, Helvetica, sans-serif"><i><a href="cbw.html">Chemical 
              - Biological Agents Fact Sheet (October 2001)</a></i> </font></div>
          </li>
        </ul>
        <ul>
          <li> 
            <center>
              <font size="2" face="Arial, Helvetica, sans-serif"><a href="warnings_list.html">List 
              of Current Travel Warnings and Public Announcements </a> </font> 
            </center>
          </li>
          <li> 
            <center>
              <font size="2" face="Arial, Helvetica, sans-serif"><a href="http://www.state.gov/www/listservs_cms.html">Receive 
              Travel Safety Information by E-Mail </a></font> 
            </center>
          </li>
          <li> 
            <center>
              <font size="2" face="Arial, Helvetica, sans-serif"><a href="road_safety.html">Road 
              Safety Overseas </a></font> 
            </center>
          </li>
          <li> 
            <center>
              <font size="2" face="Arial, Helvetica, sans-serif"><a href="euro.html">Information 
              on Conversion to the Euro</a> </font> 
            </center>
          </li>
      </ul></td>
    </tr>
  </table>
</center>
<hr>
Here is the coding I am trying to use located at
http://www.passageinternational.com/travelscrape.php

Code: Select all

<?php

        $url = "http://travel.state.gov/travel_warnings.html/";

        $filepointer = fopen($url,"r");

  if($filepointer)&#123;

  while(!feof($filepointer))&#123;

              $buffer = fgets($filepointer, 4096);

                $file .= $buffer;

            &#125;

            fclose($filepointer);

         &#125; else &#123;

              die("Could not create a connection to whatever");    

        &#125;

    ?>

    <?php

          preg_match("/<hr width=500 align=CENTER>(.*)<hr>/i",$file,$match);

         $result = $match&#1111;1];

         echo $result;    

     ?>
So how do I get it to ignore the line breaks and scrape more than one line?

Thanks
Chris

Posted: Fri Oct 03, 2003 1:34 pm
by murph
what do you mean by scrape?

Posted: Fri Oct 03, 2003 1:36 pm
by chriswheat
murph wrote:what do you mean by scrape?
take the information and place it on mywebsite automaticly.

Chris

Posted: Fri Oct 03, 2003 1:38 pm
by murph
if by scrape you mean print out all that html, then just do assign all that html to a variable like
$html = <<<EOT
<table>
<tr>
<td>html code you have above</td>
</tr>
</table>

EOT;
With that example you dont have to escape any quotes, and you can print out the html by just doing echo $html;

Posted: Fri Oct 03, 2003 1:52 pm
by chriswheat
if by scrape you mean print out all that html, then just do assign all that html to a variable like
$html = <<<EOT
Thank you for the quick reply, but I am fairly new to php and I am also confused the example of html I gave above is only a portion of the html found at travel.state.gov/travel_warnings.html

How do I isolate the section of html I want and exclude the stuff I do not want?

Many thanks

Posted: Fri Oct 03, 2003 2:49 pm
by murph
Well for each new piece of html you can make a new variable, that can get lengthy. So you can just end the php code, past the html, then start the php code again like.

Code: Select all

<?php
echo "blah";
while ( !$blah )
{
?>
Html here
<?php
}
$more_php_here;
?>