Super simple question

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Actually, no it wont. Try it.

The '?' modifier stops it from being greedy and causes it to stop at the first instance of '</div>'. The only way for it to fail would be if there were divs nested within this one, in which case he would be after tokenization or some sort of parser. explode() is simply not practical.
ivanfx
Forum Newbie
Posts: 14
Joined: Sun Jul 01, 2007 3:47 am

Post by ivanfx »

Well, that did happen! ;)

So I'm now asking how to remove it, with regex :D

A footer div was left and a paragraph containing AdSense ads..


:cry:
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

What happened? Nested div tags?

What is the HTML that you are trying to extract, exactly?
ivanfx
Forum Newbie
Posts: 14
Joined: Sun Jul 01, 2007 3:47 am

Post by ivanfx »

I'm working on a widget that extracts search results from sekko.

Here is the link to a sample page:

http://www.sekko.nl/?query=hani&page=1
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Post by miro_igov »

I wonder why you not use preg_match_all and match the anchors and paragraphs separately. Then put them in array structure and display however you want.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

I don't know of any search engines that allow you to do that. You're basically *stealing* traffic and bandwidth from them.

What part of that do you want? What HTML?
ivanfx
Forum Newbie
Posts: 14
Joined: Sun Jul 01, 2007 3:47 am

Post by ivanfx »

Got an idea how?
I tried everything, guess I have tu brush on my skill :?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

miro_igov wrote:I wonder why you not use preg_match_all and match the anchors and paragraphs separately. Then put them in array structure and display however you want.
Agreed. That'd make much more sense.
ivanfx
Forum Newbie
Posts: 14
Joined: Sun Jul 01, 2007 3:47 am

Post by ivanfx »

superdezign wrote:I don't know of any search engines that allow you to do that. You're basically *stealing* traffic and bandwidth from them.

What part of that do you want? What HTML?

It's not stealing, I've put their logo on top! 8)

I just want the <div id="serps"></div> part, but only with the
link titles, descriptions and link.

I've put it on the previous page. :lol:
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Post by miro_igov »

Code: Select all

preg_match_all('#<h4><a href="([^"]*)">(.*)</a></h4>#', $data, $title_links);

preg_match_all('#<p class="result">([^<]*)<br#',$data,$description);

preg_match_all('#<a class="url" href="([^"]*)">([^<]*)</a>#', $data, $footer_links);

print_r($title_links); 

print_r($description);

print_r($footer_links);
ivanfx
Forum Newbie
Posts: 14
Joined: Sun Jul 01, 2007 3:47 am

Post by ivanfx »

Thanks, I'll give it a try later! :)
Post Reply