Page 1 of 1

xpaths

Posted: Mon Jun 02, 2008 8:12 pm
by mxb7642
I'd like to get the html from a remote site so that I can retrieve data from it using xpaths so I need it as a valid xml dom object. How can this be done for a website, say http://www.yahoo.com. Thanks

Re: xpaths

Posted: Tue Jun 03, 2008 7:21 am
by Chris Corbyn
If the site is not valid XML then you'll struggle. Web browsers go to great lengths to deal with invlaid (X)HTML but most XML libraries want valid XML.

SimpleXML may work, but I doubt it. What do you need to do this for?

Re: xpaths

Posted: Tue Jun 03, 2008 11:55 am
by mxb7642
Id like to pull a list of locations off of a certain site so that I can keep my database upto date.
I don't believe the site i'm looking at provides regular xml but could you please explain what makes xml regular? And are there any good packages to convert irregular xml to regualr xml? Thank you.

Re: xpaths

Posted: Tue Jun 03, 2008 4:29 pm
by Weirdan
actually dom extension allows to load html:
php manual wrote: DOMDocument::loadHTML
bool DOMDocument::loadHTML ( string $source )
The function parses the HTML contained in the string source . Unlike loading XML, HTML does not have to be well-formed to load. [...]

Re: xpaths

Posted: Tue Jun 03, 2008 7:53 pm
by mxb7642
thanks. thats exactly what i needed (once i was able to suppress all the annoying warnings).
do you know if its possible to make the query command return an array of string instead of objects? I know its not necessary, but i think it would be more efficient.