Page 1 of 1
how to get a subset of an html string/file
Posted: Thu Mar 29, 2012 9:03 pm
by kc11
Hi,
I have an HTML string/page stored in variable $html. I would like to get part of the html of this page which has a body with class=foo and multiple divs including div[id=bar]. So I am trying to get :
Code: Select all
$subset=$html->body[class=foo]->div[id=bar];
How can I do this?
Thank you in advance,
KC
Re: how to get a subset of an html string/file
Posted: Fri Mar 30, 2012 6:54 am
by maneetpuri
Hi,
You can do this using regular expressions, but what is the purpose.
Cheers,
~M
Re: how to get a subset of an html string/file
Posted: Fri Mar 30, 2012 6:56 am
by social_experiment
maneetpuri wrote: but what is the purpose.
To use the information would be a good guess;
Re: how to get a subset of an html string/file
Posted: Fri Mar 30, 2012 8:14 am
by Tiancris
May be DOM could help?
http://ar.php.net/manual/en/class.domdocument.php
Code: Select all
$html = "<html><body>Test<br></body></html>";
$doc = new DOMDocument();
$doc->loadHTML($html);
And you have functions like:
Code: Select all
$doc->getElementById()
$doc->getElementsByTagName()
Re: how to get a subset of an html string/file
Posted: Fri Mar 30, 2012 9:07 am
by kc11
Thanks for looking at this guys,
So far I have:
Code: Select all
$dom = new DomDocument();
@$dom->loadHTML($html);
// CREATE XPATH OBJECT
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate("//div[@id='body']//*/text()") as $elt) {
var_dump($elt);
$body=(trim($elt->wholeText));
}
echo 'body '.$body;
The problem with this ( and also with getElementById, getElementsByTagName ) is that it returns a domnodelist
I just want to get the //div[@id='body'] node and its subtree , as html.
KC
Re: how to get a subset of an html string/file
Posted: Fri Mar 30, 2012 11:01 am
by kc11
I think I have it:
I am first getting the node I want with xpath ( you could probably use also use getElementById, getElementsByTagName ) and them importing this into a new domdocument node
Code: Select all
$dom = new DomDocument();
@$dom->loadHTML($html);
// CREATE XPATH OBJECT -
$xpath = new DOMXPath($dom);
// MAKES A DOMNODE NOT A DOMNODELIST, because you have to import a domnode!
$body=$xpath->evaluate("//div[@id='body']")->item(0) ;
$subtree=new DOMDocument();
$node2=$subtree->importNode($body, TRUE); // true imports node's subtree not just targeted node
$subtree->appendChild($node2); // / And then append it to the "<root>" node
print($subtree->saveHTML());
I'd be interested if there is a better way.
Thanks,
KC