Page 2 of 2
Posted: Sun Jul 01, 2007 8:35 am
by superdezign
Actually, no it wont. Try it.
The '?' modifier stops it from being greedy and causes it to stop at the first instance of '</div>'. The only way for it to fail would be if there were divs nested within this one, in which case he would be after tokenization or some sort of parser. explode() is simply not practical.
Posted: Sun Jul 01, 2007 9:23 am
by ivanfx
Well, that did happen!
So I'm now asking how to remove it, with regex
A footer div was left and a paragraph containing AdSense ads..

Posted: Sun Jul 01, 2007 9:41 am
by superdezign
What happened? Nested div tags?
What is the HTML that you are trying to extract, exactly?
Posted: Sun Jul 01, 2007 9:50 am
by ivanfx
I'm working on a widget that extracts search results from sekko.
Here is the link to a sample page:
http://www.sekko.nl/?query=hani&page=1
Posted: Sun Jul 01, 2007 9:57 am
by miro_igov
I wonder why you not use preg_match_all and match the anchors and paragraphs separately. Then put them in array structure and display however you want.
Posted: Sun Jul 01, 2007 10:03 am
by superdezign
I don't know of any search engines that allow you to do that. You're basically *stealing* traffic and bandwidth from them.
What part of that do you want? What HTML?
Posted: Sun Jul 01, 2007 10:03 am
by ivanfx
Got an idea how?
I tried everything, guess I have tu brush on my skill

Posted: Sun Jul 01, 2007 10:04 am
by superdezign
miro_igov wrote:I wonder why you not use preg_match_all and match the anchors and paragraphs separately. Then put them in array structure and display however you want.
Agreed. That'd make much more sense.
Posted: Sun Jul 01, 2007 10:05 am
by ivanfx
superdezign wrote:I don't know of any search engines that allow you to do that. You're basically *stealing* traffic and bandwidth from them.
What part of that do you want? What HTML?
It's not stealing, I've put their logo on top!
I just want the <div id="serps"></div> part, but only with the
link titles, descriptions and link.
I've put it on the previous page.

Posted: Sun Jul 01, 2007 10:07 am
by miro_igov
Code: Select all
preg_match_all('#<h4><a href="([^"]*)">(.*)</a></h4>#', $data, $title_links);
preg_match_all('#<p class="result">([^<]*)<br#',$data,$description);
preg_match_all('#<a class="url" href="([^"]*)">([^<]*)</a>#', $data, $footer_links);
print_r($title_links);
print_r($description);
print_r($footer_links);
Posted: Sun Jul 01, 2007 10:09 am
by ivanfx
Thanks, I'll give it a try later!
