Hi all,
I am trying to reformat the weather information from this site: http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO. I was able to retrieve the entire HTML page using:
<?
$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO";
$file = fopen("$theURL", "r");
$rf = fread($file, 20000);
fclose($file);
echo $rf;
?>
However, I encounter two problems.
1. All the links now are appended after my home directory, so I could not go to the correct page.
2. I want to do some formatting with the data on this page. For example, I want to change the font and size of the text, replace the table headings with some other text, and add other contents.
Most importantly, I want to remove the images of the original page. For this, I tried
$printing[1] = str_replace("<img src=...>", "", $rf);
But it won't work. I am a beginner with PHP, and would appreciate any help or suggestion on this subject.
Thank you very much!
Cat
Help with reformatting parsed information
Moderator: General Moderators
-
catherine878
- Forum Newbie
- Posts: 6
- Joined: Wed Jun 26, 2002 7:04 pm
tryif you're allowed to 
Code: Select all
<?
$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO";
$file = fopen("$theURL", "r");
$rf = fread($file, 20000);
fclose($file);
$rf = preg_replace(array('!</head>!', '/<imgї^>]*>/'), array('<base href="http://cdec.water.ca.gov/"></head>',''), $rf);
echo $rf;
?>-
catherine878
- Forum Newbie
- Posts: 6
- Joined: Wed Jun 26, 2002 7:04 pm
Thanks!
hi volka, thanks for your code, it's working now, although I am a little confused about '/<img[^>]*>/' in
$rf = preg_replace(array('!</head>!', '/<img[^>]*>/'), array('<base href="http://cdec.water.ca.gov/"></head>',''), $rf);
If you could explain it a bit or point me to some resources that would be great.
To follow up, I have another similar question. On http://cdec.water.ca.gov/cgi-progs/queryF?OSO, there is a link for each of the headings. The link Rain is pointing to a diagram on http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 page. When the user clicks on this link, I would like just the image. This is what i propose to do:
1. Change the link from http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 to rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18
2. When the user clicks on the above link (modified), the queryID and end_date are passed as parameters to rain-diagram.php page.
3. In rain-diagram.php page, I will reconstruct the link with the two parameters and fetch the page http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 and keep the picture with no other text.
I am not sure if this approach would work. I am having some difficulty replacing the "http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18" to "rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18".
I have
$newPage = ereg_replace ("http://cdec.water.ca.gov/cgi-progs/", "rain-diagram.php", $readfile);
But it is not working. Maybe it won't let me replace part of a HTML tag. I have no idea. Can you help??
Thanks!
Cat.
$rf = preg_replace(array('!</head>!', '/<img[^>]*>/'), array('<base href="http://cdec.water.ca.gov/"></head>',''), $rf);
If you could explain it a bit or point me to some resources that would be great.
To follow up, I have another similar question. On http://cdec.water.ca.gov/cgi-progs/queryF?OSO, there is a link for each of the headings. The link Rain is pointing to a diagram on http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 page. When the user clicks on this link, I would like just the image. This is what i propose to do:
1. Change the link from http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 to rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18
2. When the user clicks on the above link (modified), the queryID and end_date are passed as parameters to rain-diagram.php page.
3. In rain-diagram.php page, I will reconstruct the link with the two parameters and fetch the page http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18 and keep the picture with no other text.
I am not sure if this approach would work. I am having some difficulty replacing the "http://cdec.water.ca.gov/cgi-progs/quer ... 2002+15:18" to "rain-diagram.php/queryID?s=1151&end_date=27-Jun-2002+15:18".
I have
$newPage = ereg_replace ("http://cdec.water.ca.gov/cgi-progs/", "rain-diagram.php", $readfile);
But it is not working. Maybe it won't let me replace part of a HTML tag. I have no idea. Can you help??
Thanks!
Cat.
Code: Select all
<?php if (strlen($_SERVERї'QUERY_STRING'])==0)
{
$theURL = "http://cdec.water.ca.gov/cgi-progs/queryF?s=OSO";
$file = fopen("$theURL", "r");
$rf = fread($file, 20000);
fclose($file);
$pattern = array( '!</head>!'
, '/<imgї^>]*>/'
, "!/cgi-progs/queryID!" );
$replace = array( '<base href="http://cdec.water.ca.gov/"></head>'
, ''
, 'http://'.$_SERVERї'SERVER_NAME'].$_SERVERї'PHP_SELF']);
$rf = preg_replace($pattern, $replace, $rf);
echo $rf;
}
else
{ header('Content-Type: image/gif');
readfile('http://cdec.water.ca.gov/cgi-progs/queryplot?clock=y&trans=y&id='.$_GETї's'].'&end='.rawurlencode($_GETї'end_date']).'&interval=49hours&width=400&height=300');
}
?>- '/<img[^>]*>/' : match every substring beginning with '<img' and ending with '>' containing no further '>' (wouldn't like it to match <img src="..."><img src="..."> as one match
) - 'http://cdec.water.ca.gov/cgi-progs/queryID? you can't replace it, because it isn't in the string. only /cgi-progs/queryID?... is.
And since <base href="http://cdec.water.ca.gov/"> has been set, 'http://'.$_SERVER['SERVER_NAME'] must be included in the new link
-
catherine878
- Forum Newbie
- Posts: 6
- Joined: Wed Jun 26, 2002 7:04 pm
Thanks and one more...
Hello Volka,
Thanks for your help. I was able to save the query string and reconstruct the URL to fetch the new page. It's not as clean as yours, but I got it to work, so I am happy.
My current problem is on this new page I retrieved
http://cdec.water.ca.gov/cgi-progs/quer ... 2002+17:20
As you can see, this data changes by the current time.
So far I am able to display the entire page. But what I want to accomplish is to display the data starting "OAKLAND NORTH (ONO)" and ending with "Warning! This has not been reviewed for accuracy." I don't want the image maps and the navigational bar at the buttom, but I do want to keep the image that's in the middle.
I tried quite a few approaches, but none worked. A few of them are:
1. $GrabStart = "<h1>OAKLAND SOUTH <font color=red><em>(OSO)</em></font></h1>";
$GrabEnd = "Warning! This has not been reviewed for accuracy.";
$GrabData = eregi("$GrabStart(.*)$GrabEnd", $rf, $matches);
2. preg_match("/.*<h1>OAKLAND\sNORTH<\/h1>(.')Warning....'.*/", $rf,$match);
3. I was using a while-loop to go through the string $matches, and keep the <img...> tags into an array. Because there are only 4 images, so I know it's the third one I want to display. Then I can just insert the other text because it's static. Unfortunately this did not work either...
I would appreciate any help or suggestion you can offer. Thanks!!
Catherine
Thanks for your help. I was able to save the query string and reconstruct the URL to fetch the new page. It's not as clean as yours, but I got it to work, so I am happy.
My current problem is on this new page I retrieved
http://cdec.water.ca.gov/cgi-progs/quer ... 2002+17:20
As you can see, this data changes by the current time.
So far I am able to display the entire page. But what I want to accomplish is to display the data starting "OAKLAND NORTH (ONO)" and ending with "Warning! This has not been reviewed for accuracy." I don't want the image maps and the navigational bar at the buttom, but I do want to keep the image that's in the middle.
I tried quite a few approaches, but none worked. A few of them are:
1. $GrabStart = "<h1>OAKLAND SOUTH <font color=red><em>(OSO)</em></font></h1>";
$GrabEnd = "Warning! This has not been reviewed for accuracy.";
$GrabData = eregi("$GrabStart(.*)$GrabEnd", $rf, $matches);
2. preg_match("/.*<h1>OAKLAND\sNORTH<\/h1>(.')Warning....'.*/", $rf,$match);
3. I was using a while-loop to go through the string $matches, and keep the <img...> tags into an array. Because there are only 4 images, so I know it's the third one I want to display. Then I can just insert the other text because it's static. Unfortunately this did not work either...
I would appreciate any help or suggestion you can offer. Thanks!!
Catherine
my advice (but I'm still a regular expression novice
) is to limit the patterns to the absolute minimum and avoid special characters, i.e unqouted (OSO) will never match, \(OSO\) may.
For your first attempt you may usefor the second
For your first attempt you may use
Code: Select all
$GrabStart = "<h1>OAKLAND";
$GrabEnd = "accuracy";Code: Select all
preg_match("/<hr>.*<hr\ssize/s", $rf,...