PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Sun Nov 23, 2014 1:23 pm

All times are UTC - 5 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Tue Jul 28, 2009 7:18 pm 
Offline
Forum Newbie

Joined: Tue Jul 28, 2009 7:16 pm
Posts: 3
Syntax: [ Download ] [ Hide ]
 
<TABLE\s+class="used-results\s+maincol"\s+cellSpacing="0">
\s*
<tr\s+class="header\s+maincol">\s*
(?<Header>.*?(?=</tr>))</tr>
 
\s+
(?<Content>
<tr(?:[^>]+?)>
.+
(?=</table>)
 
)
</table>


I apply this regex in Regulator to the file zipped

http://depositfiles.com/files/ipowahlh5

Why does "Header" group contains 11111111111111 in it? It's supposed to stop at the first </tr>.

And if I remove

Syntax: [ Download ] [ Hide ]
 
(?<Content>
<tr(?:[^>]+?)>
.+
(?=</table>)
 
)
</table>

it becomes OK.


Attachments:
text.zip [9.27 KiB]
Downloaded 53 times
Top
 Profile  
 
PostPosted: Wed Jul 29, 2009 1:13 am 
Offline
Forum Regular
User avatar

Joined: Fri Apr 04, 2008 5:51 am
Posts: 779
I, probably like many people answering questions in this forum, am not going to download some file(s) and try to figure out what you're trying to do and/or what your question is. So, if you'd like my help, please describe your problem on the forum without needing to download anything. Thanks.


Top
 Profile  
 
PostPosted: Wed Jul 29, 2009 1:45 am 
Offline
Forum Newbie

Joined: Tue Jul 28, 2009 7:16 pm
Posts: 3
OK, lt me sow here the text to be processed:


<div class="main">
<div class="paging maincol">

<b> 1 </b> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=2">2</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=3">3</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=4">4</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=5">5</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=6">6</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=7">7</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=8">8</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=9">9</a> |

<a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=10">10</a> |

<span class="hpad15"><a href="http://www.autosite.com.ua/used_results.html?Make=0&Type=1&Model=&PriceFrom=&PriceTo=&Body=any&YearFrom=&YearTo=&Engine=any&MileageFrom=&MileageTo=&Color=any&Price=0&Year=0&Region=0&only_photo=&only_24=&only_dtp=&smod=&sdir=&page=10">Следующие &raquo;</a></span>

</div>
</div>
<div style="clear:both">
<table class="used-results maincol" cellspacing="0">
<tr class="header maincol">
<td class="header" width="118">
<div class="tr-left fl"><b>Цена</b></div>
<div class="tr-right fr"><a href="javascript:Sort('Price', 'asc')"><img src="/img/sort_asc_inactive.gif"></a><a href="javascript:Sort('Price', 'desc')"><img src="/img/sort_desc_inactive.gif"></a></div>
</td>
<td class="header" width="212"><div class="tr-left fl"><b>Марка/модель/модификация</b></div></td>
<td class="header" width="83">
<div class="tr-left fl"><b>Год</b>
выпуска</div>
<div class="tr-right fr"><a href="javascript:Sort('Year', 'asc')"><img src="/img/sort_asc_inactive.gif"></a><a href="javascript:Sort('Year', 'desc')"><img src="/img/sort_desc_inactive.gif"></a></div>
</td>
<td class="header" width="75">
<div class="tr-left fl"><b>Пробег</b></div>
<div class="tr-right fr"><a href="javascript:Sort('Mileage', 'asc')"><img src="/img/sort_asc_inactive.gif"></a><a href="javascript:Sort('Mileage', 'desc')"><img src="/img/sort_desc_inactive.gif"></a></div>
</td>
<td width="105" class="header">
<div class="tr-left fl"><b>Дата</b>
добавления</div>
<div class="tr-right fr"><a href="javascript:Sort('Date', 'asc')"><img src="/img/sort_asc_inactive.gif"></a><a href="javascript:Sort('Date', 'desc')"><img src="/img/sort_desc_inactive.gif"></a></div>
</td>
<td width="58">&nbsp;</td>
</tr>
11111111111111111111111111111
<tr>
<td class="res-left"><a href="http://www.autosite.com.ua/Skoda-Octavia-LS_auto_520576.html"><img src="/pictures/28-6-2009/520576/thumb_image1.jpg" border="0" class="imgborder" ></a><div class="price red11 b lh18">9100 <b>USD</b></div></td>
<td class="res"><div class="c">
<a href="http://www.autosite.com.ua/Skoda-Octavia-LS_auto_520576.html" class="red12"><strong>Skoda Octavia LS</strong></a>


<strong>Закарпатье</strong>

ХОРОШЕЕ СОСТОЯНИЕ.ЧЕШСКАЯ ЗБОРКА. ...&nbsp;&nbsp;

<a href="http://www.autosite.com.ua/Skoda-Octavia-LS_auto_520576.html">Подробнее</a></div>
</td>
<td class="res">2000 г</td>
<td class="res">200000 км </td>
<td class="res">28/06/2009</td>
<td class="res-right" align="center">
<img src="http://i.autosite.com.ua/img/urgent_icon.gif" alt="" /></td>
</tr>


<tr bgcolor="#FCFDFE">
<td class="res-left"><a href="http://www.autosite.com.ua/Mazda-323_auto_543533.html"><img src="/pictures/25-7-2009/543533/thumb_image1.jpg" border="0" class="imgborder" ></a><div class="price red11 b lh18">3800 <b>USD</b></div></td>
<td class="res"><div class="c">
<a href="http://www.autosite.com.ua/Mazda-323_auto_543533.html" class="red12"><strong>Mazda 323 </strong></a>


<strong>АР Крым</strong>

сигнализация, цз., протувотуманки, иммобилайзер, гу рул ...&nbsp;&nbsp;

<a href="http://www.autosite.com.ua/Mazda-323_auto_543533.html">Подробнее</a></div>
</td>
<td class="res">1991 г</td>
<td class="res">80000 км </td>
<td class="res">25/07/2009</td>
<td class="res-right" align="center">
<img src="http://i.autosite.com.ua/img/urgent_icon.gif" alt="" /></td>
</tr>

</table>



After processing Header contains 11111111111111111111111111111. This not what I want. I need to have 11111111111111111111111111111 and all text below in the "Content" group.


Top
 Profile  
 
PostPosted: Wed Jul 29, 2009 2:26 am 
Offline
Forum Regular
User avatar

Joined: Fri Apr 04, 2008 5:51 am
Posts: 779
senglory wrote:
Syntax: [ Download ] [ Hide ]
 
<TABLE\s+class="used-results\s+maincol"\s+cellSpacing="0">
\s*
<tr\s+class="header\s+maincol">\s*
(?<Header>.*?(?=</tr>))</tr>
 
\s+
(?<Content>
<tr(?:[^>]+?)>
.+
(?=</table>)
 
)
</table>


I apply this regex in Regulator to the file zipped

http://depositfiles.com/files/ipowahlh5

Why does "Header" group contains 11111111111111 in it? It's supposed to stop at the first </tr>.


No, look at this part of your regex:
Syntax: [ Download ] [ Hide ]
(?<Header>.*?(?=</tr>))</tr>
\s+
(?<Content><tr(?:[^>]+?)>)


When you have matched your first "</tr>", you tell the regex engine you only want white space characters to come in between that "</tr>" and the the next "<tr", yet there is "11111111111111" in between the first "</tr>" and the second "<tr". Besides, the second "<tr" is followed directly by a ">" making this part of your regex fail as well:
Syntax: [ Download ] [ Hide ]
(?<Content><tr(?:[^>]+?)>)

since you defined there should be at least one non '>' character after it ([^>]+).


Top
 Profile  
 
PostPosted: Wed Jul 29, 2009 2:39 am 
Offline
Forum Regular
User avatar

Joined: Fri Apr 04, 2008 5:51 am
Posts: 779
Try something like this:

Syntax: [ Download ] [ Hide ]
$regex = '#
  <TABLE\s+class="used-results\s+maincol"\s+cellSpacing="0">\s*
  <tr\s+class="header\s+maincol">\s*
  (?<Header>
    (?:(?!</tr>).)*+
  )
  </tr>
  .*?
  (?<Content>
    <tr(?:[^>]*+)>
    (?:(?!</table>).)*+
  )
  </table>
#xsi'
;


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: Bing [Bot] and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group