regular exp

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
roice
Forum Commoner
Posts: 35
Joined: Tue Mar 02, 2010 9:14 am

regular exp

Post by roice »

Hello,
I find website that publish music news. I wish to use PHP - file get content in order to get the first 4 news in it.
I know that it goes like this:
The url to the full article and the title are here:

Code: Select all

<a class="item-head" href="http://www.mouse.co.il/CM.articles_item,694,209,47949,.aspx">ARTICL TITLE</a>
Article summery goes here:

Code: Select all

<div class="item-text">ARTICLE SUMMERY</div>
Article image is here:

Code: Select all

<div class="item-right">
	<a href="http://www.mouse.co.il/CM.articles_item,694,209,47949,.aspx">
	    <img src="http://images.mouse.co.il/storage/b/d/IMAGE-NAME.jpg" height="87" width="132" /></a>
</div>
How can I get into 4 variables those contents?

Thank you in advance!
User avatar
twinedev
Forum Regular
Posts: 984
Joined: Tue Sep 28, 2010 11:41 am
Location: Columbus, Ohio

Re: regular exp

Post by twinedev »

Assuming all articles have the same three items you gave us, and those are the only places that these classes are used (within the those particular tags) and also assuming there are no nested div tags inside the summary and assuming every article will have an image and lastly, for the first one, class="item-head" always needs to come before the href="" part... (in other words, when ripping data from a source you don't make, always be prepared for it to change and needing to tweak the code when it does)

Code: Select all

preg_match_all('%<a [^>]*?class="item-head" [^>]*?href="([^"]+)"[^>]*?>(.+?)</a>%si', $subject, $aryLinkTitle, PREG_PATTERN_ORDER);

preg_match_all('%<div [^>]*?class="item-text"[^>]*?>(.*?)</div>%si', $subject, $arySummary, PREG_PATTERN_ORDER);

preg_match_all('%<div [^>]*?class="item-right"[^>]*?>.*?<img src="([^"]+)".*?>.*?</div>%si', $subject, $aryFile, PREG_PATTERN_ORDER);
Now if you do a var_dump of each of those three ($aryLinkTitle, $arySummary, and $aryFile) you should see how they line up, and how you can loop through one set using foreach, and use the key to grab the summary and File from the others.

-Greg
Post Reply