Help with reformatting parsed information

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
catherine878
Forum Newbie
Posts: 6
Joined: Wed Jun 26, 2002 7:04 pm

Help with reformatting parsed information

Post by catherine878 »

Hi all,
I am trying to reformat the weather information from this site: http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO. I was able to retrieve the entire HTML page using:
<?
$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO";
$file = fopen("$theURL", "r");
$rf = fread($file, 20000);
fclose($file);
echo $rf;
?>

However, I encounter two problems.

1. All the links now are appended after my home directory, so I could not go to the correct page.

2. I want to do some formatting with the data on this page. For example, I want to change the font and size of the text, replace the table headings with some other text, and add other contents.

Most importantly, I want to remove the images of the original page. For this, I tried

$printing[1] = str_replace("<img src=...>", "", $rf);

But it won't work. I am a beginner with PHP, and would appreciate any help or suggestion on this subject.

Thank you very much!

Cat
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

try

Code: Select all

<? 
$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO"; 
$file = fopen("$theURL", "r"); 
$rf = fread($file, 20000); 
fclose($file); 
$rf = preg_replace(array('!</head>!', '/<img&#1111;^>]*>/'), array('<base href="http://cdec.water.ca.gov/"></head>',''), $rf);
echo $rf; 
?>
if you're allowed to ;)
catherine878
Forum Newbie
Posts: 6
Joined: Wed Jun 26, 2002 7:04 pm

Thanks!

Post by catherine878 »

hi volka, thanks for your code, it's working now, although I am a little confused about '/<img[^>]*>/' in
$rf = preg_replace(array('!</head>!', '/<img[^>]*>/'), array('<base href="http://cdec.water.ca.gov/"></head>',''), $rf);

If you could explain it a bit or point me to some resources that would be great.

To follow up, I have another similar question. On http://cdec.water.ca.gov/cgi-progs/queryF?OSO, there is a link for each of the headings. The link Rain is pointing to a diagram on http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 page. When the user clicks on this link, I would like just the image. This is what i propose to do:

1. Change the link from http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 to rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18

2. When the user clicks on the above link (modified), the queryID and end_date are passed as parameters to rain-diagram.php page.

3. In rain-diagram.php page, I will reconstruct the link with the two parameters and fetch the page http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 and keep the picture with no other text.

I am not sure if this approach would work. I am having some difficulty replacing the "http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18" to "rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18".

I have

$newPage = ereg_replace ("http://cdec.water.ca.gov/cgi-progs/", "rain-diagram.php", $readfile);

But it is not working. Maybe it won't let me replace part of a HTML tag. I have no idea. Can you help??

Thanks!


Cat.
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

Code: Select all

<?php if (strlen($_SERVER&#1111;'QUERY_STRING'])==0)
&#123;
	$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO"; 
	$file = fopen("$theURL", "r"); 
	$rf = fread($file, 20000); 
	fclose($file);
	$pattern = array( '!</head>!'
									, '/<img&#1111;^>]*>/'
									, "!/cgi-progs/queryID!" );
	$replace = array(	'<base href="http://cdec.water.ca.gov/"></head>'
									,	''
									, 'http://'.$_SERVER&#1111;'SERVER_NAME'].$_SERVER&#1111;'PHP_SELF']);
	$rf = preg_replace($pattern, $replace, $rf); 
	echo $rf;
&#125;
else
&#123; header('Content-Type: image/gif');
	readfile('http://cdec.water.ca.gov/cgi-progs/queryplot?clock=y&trans=y&id='.$_GET&#1111;'s'].'&end='.rawurlencode($_GET&#1111;'end_date']).'&interval=49hours&width=400&height=300');
&#125;
?>
should do the trick
  • '/<img[^>]*>/' : match every substring beginning with '<img' and ending with '>' containing no further '>' (wouldn't like it to match <img src="..."><img src="..."> as one match ;) )
  • 'http://cdec.water.ca.gov/cgi-progs/queryID? you can't replace it, because it isn't in the string. only /cgi-progs/queryID?... is.
    And since <base href="http://cdec.water.ca.gov/"> has been set, 'http://'.$_SERVER['SERVER_NAME'] must be included in the new link
But just remember: That's hotlinking ;)
catherine878
Forum Newbie
Posts: 6
Joined: Wed Jun 26, 2002 7:04 pm

Thanks and one more...

Post by catherine878 »

Hello Volka,

Thanks for your help. I was able to save the query string and reconstruct the URL to fetch the new page. It's not as clean as yours, but I got it to work, so I am happy.

My current problem is on this new page I retrieved
http://cdec.water.ca.gov/cgi-progs/quer ... 2002+17:20

As you can see, this data changes by the current time.

So far I am able to display the entire page. But what I want to accomplish is to display the data starting "OAKLAND NORTH (ONO)" and ending with "Warning! This has not been reviewed for accuracy." I don't want the image maps and the navigational bar at the buttom, but I do want to keep the image that's in the middle.

I tried quite a few approaches, but none worked. A few of them are:

1. $GrabStart = "<h1>OAKLAND SOUTH <font color=red><em>(OSO)</em></font></h1>";
$GrabEnd = "Warning! This has not been reviewed for accuracy.";
$GrabData = eregi("$GrabStart(.*)$GrabEnd", $rf, $matches);

2. preg_match("/.*<h1>OAKLAND\sNORTH<\/h1>(.')Warning....'.*/", $rf,$match);

3. I was using a while-loop to go through the string $matches, and keep the <img...> tags into an array. Because there are only 4 images, so I know it's the third one I want to display. Then I can just insert the other text because it's static. Unfortunately this did not work either...

I would appreciate any help or suggestion you can offer. Thanks!!

Catherine
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

my advice (but I'm still a regular expression novice ;) ) is to limit the patterns to the absolute minimum and avoid special characters, i.e unqouted (OSO) will never match, \(OSO\) may.
For your first attempt you may use

Code: Select all

$GrabStart = "<h1>OAKLAND"; 
$GrabEnd = "accuracy";
for the second

Code: Select all

preg_match("/<hr>.*<hr\ssize/s", $rf,...
Post Reply