Page 1 of 1

extract html

Posted: Thu May 11, 2006 12:06 pm
by ziggy1621
hello all,
i have a client who wants a script to pull info off of a webpage and update in his. I remember reading somewhere that there are rss scripts that can search the html of a page and between pointA and pointB pull all info between and display with a php rss viewer script(which I already have).

so my question is does anyone know of a script/tutorial on how to extract the info?

thanks

Posted: Thu May 11, 2006 12:12 pm
by Burrito
I don't know of any rss tools that will do it, but you could use file_get_contents() to grab the raw html, then use a regular expression to strip out what you need.

if you need help with the regular expression, we can provide :D

Posted: Thu May 11, 2006 12:27 pm
by ziggy1621
Burrito wrote:I don't know of any rss tools that will do it, but you could use file_get_contents() to grab the raw html, then use a regular expression to strip out what you need.

if you need help with the regular expression, we can provide :D
i'm playin with file_get_contents and it seems to be doing the same as include($file) ... please do send me to a good tutorial on regular expressions. I'm pretty fluent in php on its relation to mysql, but outside of that, i'm learnin still

thx

Posted: Thu May 11, 2006 12:43 pm
by Burrito
file_get_contents is not the same as include. With file_get_contents you set a string variable to the contents of a file (the html in your case).

the best regex tut on the net is right here: viewtopic.php?t=33147 :D

Posted: Thu May 11, 2006 12:52 pm
by ziggy1621
Burrito wrote:file_get_contents is not the same as include. With file_get_contents you set a string variable to the contents of a file (the html in your case).

the best regex tut on the net is right here: viewtopic.php?t=33147 :D
kewl, well i got my readin cut out... thx for the help