PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!
Moderator: General Moderators
btfans
Forum Newbie
Posts: 22 Joined: Thu Jun 10, 2004 10:58 am
Post
by btfans » Sun Jun 27, 2004 11:10 am
Hello,
I want to extract some data from an html site ......
and reformat some data between <pre> .. </pre> tags ..... pls help; as I am newbie ... this php code now not work ?
Code: Select all
<?
$file = "http://something.htm";
$contents = file($file);
$size = sizeof($contents);
for($i = 0; $i < $size; $i++) {
$alldata = $contents[$i];
preg_match("/<pre.*?>(.+)<\/pre>/im",$alldata,$matches);
print_r($matches);
}
?>
The html page is (encode with big5 char) ...
Code: Select all
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=big5">
<title>something</title>
</head>
<body bgcolor="#FFFFFF">
<p align="center"><img src="../../images_e/logo_dblue.gif" alt="logo" width="333" height="65">
<h1 align="center">report</h1>
<p><i>report detail</i>
<pre>
line 1:
tag2 : data2
tag3 : data3
tag4 : number n - data4
line 5
line 6
line 7
</pre>
</body>
</html>
Requirement:
Output string to show the following :
Code: Select all
line 1:
tag2 : data2
tag3 : data3
tag4 : data4
Thanks very much ....
feyd | please use the Code: Select all
tags we've provided :: [/color][url=http://forums.devnetwork.net/viewtopic.php?t=21171][color=red]:arrow: [u][b]Posting Code in the Forums[/b][/u][/color][/url]
Last edited by
btfans on Mon Jun 28, 2004 12:36 am, edited 1 time in total.
kettle_drum
DevNet Resident
Posts: 1150 Joined: Sun Jul 20, 2003 9:25 pm
Location: West Yorkshire, England
Post
by kettle_drum » Sun Jun 27, 2004 12:24 pm
If your not good with regex then use a mixture of substr and strpos to parse away data you dont want - or even explode().
feyd
Neighborhood Spidermoddy
Posts: 31559 Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA
Post
by feyd » Sun Jun 27, 2004 1:13 pm
you can try switching your regex to:
Code: Select all
preg_match('#<pre[^>]*?'.'>(.*?)</pre>#is',$alldata,$matches);
[edit]made a slight alteration to get the code to show correctly..
Last edited by
feyd on Mon Jun 28, 2004 12:42 am, edited 1 time in total.
btfans
Forum Newbie
Posts: 22 Joined: Thu Jun 10, 2004 10:58 am
Post
by btfans » Mon Jun 28, 2004 12:34 am
Will try .....
Can anyone teach me the reformatting of
Code: Select all
line 1:
tag2 : data2
tag3 : data3
tag4 : number n - data4
line 5
linne6
line 7
...
to
Code: Select all
line 1:
tag2 : data2
tag3 : data3
tag4 : data4
btfans
Forum Newbie
Posts: 22 Joined: Thu Jun 10, 2004 10:58 am
Post
by btfans » Mon Jun 28, 2004 11:52 am
I modified as:
--------------------------------------------------------------------------------
<?
$file = "
http://something.htm ";
$contents = file($file);
$size = sizeof($contents);
$alldata = '';
for($i = 0; $i < $size; $i++) {
$alldata .= $contents[$i];
if (preg_match_all("|<pre.*?>(.*?)</pre>|is",$alldata,$matches));
{
$main = implode(' ',$matches[1]);
echo $main;
}
}
?>
--------------------------------------------------------------------------------
and result now:
--------------------------------------------------------------------------------
line 1: tag2 : data2tag3 : data3tag4 : number n - data4line 5line 6line 7
--------------------------------------------------------------------------------
and repeat many times....
So my (silly) question:
1) how to get those I want
2) cannot see "\n" ??
btfans
Forum Newbie
Posts: 22 Joined: Thu Jun 10, 2004 10:58 am
Post
by btfans » Wed Jun 30, 2004 5:21 am
Hi,
Sorry for inexperience on "\n" and php, now I changed my code as
<?
$file = "something.htm";
$contents = file($file);
$size = sizeof($contents);
$alldata=implode("\n", $contents);
preg_match_all("|<pre.*?>(.*?)</pre>|ism",$alldata,$matches);
foreach($matches[1] as $match)
{ $pieces = explode(":", $match);
echo "$pieces[0] <br>";
echo "$pieces[1] <br>";
echo "$pieces[2] <br>";
echo "$pieces[3] <br>";
echo nl2br ($pieces[4])."<br>\n";
}
?>
result:
line 1
tag2
data2tag3
data3tag4
number n - data4
line 5
Howto change to :
line 1
tag2data2
tag3data3
tag4data4
and tag4 remove "number n -"
Can I use-
$pieces = explode(":\n", $match);
to extract all parts between ":" ??
Not quite understand "\n" in IE (ignored?)
Any advise welcome.
btfans
Forum Newbie
Posts: 22 Joined: Thu Jun 10, 2004 10:58 am
Post
by btfans » Thu Jul 01, 2004 1:44 am
[SOLVED]