Parsing a html site and using some content ?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Parsing a html site and using some content ?

Post by optik »

hello I've been looking for a sample for a couple of hours now not to mention few months before also but never found waht i needed always find some more advanced things but can't find the basics and i'm not too advanced in php yet so hopefully you guys can help
i'm trying to parse a server status page to tell if it's up or down so i need something to get the html code use it to find some string and use that lline and few lines after it or if possible selecting the <tr> range i've done similar things on batch script for html so thats where the idea is maybe in php it's much different if so give me an idea at least,
thank you all ideas/sugjustions welcome
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

Post by rehfeld »

Code: Select all

$doc = file_get_contents('http://foo.org');

sscanf($doc, "Status: %s", $status);

echo $status;
youll need to learn how to use sscanf, thats likely going to be the easiest way to get the status. as long as you always know what text will be in front of the actual status, sscaf will work good.

if the document was:

<html>
<table>
<td> blah foo bar Status: online blah blah fooo

</html>

the above code will work, and echo $status would output "online"
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

nice thanks a lot i'm on it hehe
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

you can also use [php_man]preg_match[/php_man] and [php_man]preg_replace[/php_man] but it isnt recommend for beginners :P
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

actually that helps me a little more been playing with that sscanf for like 30 mins can't get any results and with the match already got one of what i'm looking for tnks , i've worked a bit with mysql , tcl and all parsing in that area so those chars aren't scary heh
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Maybe you can be a bit more specific on what you are trying to match so I cann help you with the regex

Also, you should also look into [php_man]preg_match_all[/php_man][php_man] if you are searching for multiple things[/php_man]
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

i am looking to parse http://lobby.soldat.pl:13073/index.html
and get all the details where there's 66.17.183.250:65000 in that table to get all player # everything like that but it's just something i want to learn some php with also so if you got time to parse some one thing in that format wiould be great to start off for me , i never really parsed anything with php so far
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

now, if you remove all the \r and \n and \s+ the matching should go quite easily.

to help you find the correct regular expression, you can use:
http://www.samuelfullman.com/team/php/t ... ster_p.php
rehfeld
Forum Regular
Posts: 741
Joined: Mon Oct 18, 2004 8:14 pm

Post by rehfeld »

i cant find "66.17.183.250:65000" anyway on that page.

is it only going to appear sometimes?


could you pick the name of something thats actually on the page, and then give us examples of what you want to parse out of it?
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

sry but one more quastion btw that exp tester is good but i have

/<a href=\"soldat:\/\/66.17.183.250:65000\/\"><font color=\"#79E958\"><b>\|Optik's Server\|<\/b><\/font><\/a><\/td>/i

and i want to match \|Optik's Server\| without writing in the name so it could be dynamic, i've been trying to find specifier list or soemething i could use also tried [a-z''\|] but no luck kind of lost is there a list somewhere of what i could be using or such ?

also for the before post you said remove \r \n \s+ would i be using preg_replace for that ?
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

rehfeld wrote:i cant find "66.17.183.250:65000" anyway on that page.

is it only going to appear sometimes?


could you pick the name of something thats actually on the page, and then give us examples of what you want to parse out of it?
that's in one of the links on the page so it's only in the source code not on the visual page so i need to take the info by my servers' ip:port and then get the details about it so if i run few servers it would also work and i could change name and etc.. and still would go ok, kind of trying to make one for other users also so it would be universal and you'd only need ip:port of your server
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

untested (plus I dont know much about regrx )

Code: Select all

/<a href="soldat:\/\/66.17.183.250:65000\/"><font color="#79E958"><b>\(&#1111;A-Za-z]+)<\/b><\/font><\/a><\/td>/i
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Code: Select all

/<a href="soldat:\/\/66.17.183.250:65000\/"><font color="#79E958"><b>(.*?)<\/b><\/font><\/a><\/td>/i
with preg_match this will returned the matched stuff (things between ( and ) ) in $matches.
also for the before post you said remove \r \n \s+ would i be using preg_replace for that ?
i said that because it would allow you to matchsomething like

<tr><td>(.*?)</td><td>(.*?)</td>.....</tr>
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

yeah i knew what you meant just wasn't sure which command i would use hehe and will try those others in a sec
optik
Forum Newbie
Posts: 9
Joined: Sun Nov 21, 2004 12:30 am

Post by optik »

need some more help hehe i figured out how to get the string that i wanted that contains all the info i need i thought this would be easier than parsing all separetly so just get the table i want and then parse that part less cpu usage too i guess anyways so i got a string now need to find out how to parse it when i have it lost once more so if you could help would be good

Code: Select all

/<ahref="soldat:\/\/66.17.183.250:65000\/"><fontcolor="#79E958"><b>(.*?)<\/b><\/font><\/a><\/td><tdwidth="37\%">(.*?)<\/td><tdwidth="8\%">(.*?)<\/td><tdwidth="14\%">(.*?)<\/td><tdwidth="12\%">(.*?)\/(.*?)<\/td><tdwidth="8\%">(.*?)<\/td>/i
returns something like

Code: Select all

<ahref="soldat://66.17.183.250:65000/"><fontcolor="#79E958"><b>|Optik'sServer|</b></font></a></td><tdwidth="37%"></td><tdwidth="8%">CTF</td><tdwidth="14%">ctf_Dropdown</td><tdwidth="12%">0/12</td><tdwidth="8%">1.2.1*</td>
and wondering how i could take data from it like '|Optik'sServer|' from it or any other because i tried other way before it gets me the whole string so when i echo it it justs adds the string which isn't really good so would like to parse out exact data then format it as wanted but don't really know how i can specify the place where it should be but not sure how to use it
Post Reply