A very hard one, but almost there!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
Frexuz
Forum Newbie
Posts: 5
Joined: Mon Oct 22, 2007 8:55 am

A very hard one, but almost there!

Post by Frexuz »

feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


Here's the HTML code i want to grab values from: (this code loops over and over a couple of times)

[syntax="html"]
      <tr height="30">
        <td colspan="3" class="tier1" align="center" style="border-right: none 1px #000;">Starting Out Small (<a class="tierlink" href="rankings.php?game=5&diff=4&song=-36&page=16&highlight=778">Rank: 778th</a>) </td>
        <td colspan="7" class="tier2" style="border-left: none 1px #000;">&nbsp;</td>
      </tr>
      <tr height="25">
        <td>
          <table cellspacing="0" cellpadding="0" align="center">
            <tr><td align="center"><a href="javascript:openWindow('view_scores.php?user=22553&song=1412')"><span style="font-size: 0.7em">View All</span></a></td></tr>
          </table>
        </td>
        <td align="center">3</td><td align="center">Slow Ride</td><td align="center"><a href="rankings.php?game=5&diff=4&song=1412&page=18&highlight=892" class="rank3">892nd</a></td>
        <td align="center">192,601</td>
        <td align="center"><img src="/images/rating_6.gif" /> (6.7)</td>
        <td align="center"><span class="percent2">99%</span></td>
        <td align="center">267</td>
        <td align="center">Nov. 18, 2007, 3:19PM</td>
        <td align="center"><span class="gray">N/A</span>
    </tr>
      <tr height="25">
        <td>
          <table cellspacing="0" cellpadding="0" align="center">
            <tr><td align="center"><a href="javascript:openWindow('view_scores.php?user=22553&song=1428')"><span style="font-size: 0.7em">View All</span></a></td></tr>
          </table>
        </td>
        <td align="center">3</td><td align="center">Talk Dirty to Me</td><td align="center"><a href="rankings.php?game=5&diff=4&song=1428&page=15&highlight=709" class="rank3">709th</a></td>
        <td align="center">298,732</td>
        <td align="center"><img src="/images/rating_6.gif" /> (6.7)</td>
        <td align="center"><span class="percent2">99%</span></td>
        <td align="center">358</td>
        <td align="center">Nov. 18, 2007, 3:32PM</td>
        <td align="center"><span class="gray">N/A</span>
    </tr>
      <tr height="25">
        <td>
          <table cellspacing="0" cellpadding="0" align="center">
            <tr><td align="center"><a href="javascript:openWindow('view_scores.php?user=22553&song=1444')"><span style="font-size: 0.7em">View All</span></a></td></tr>
          </table>
        </td>
        <td align="center">3</td><td align="center">Hit Me With Your Best Shot</td><td align="center"><a href="rankings.php?game=5&diff=4&song=1444&page=18&highlight=868" class="rank3">868th</a></td>
        <td align="center">166,643</td>
        <td align="center"><img src="/images/rating_6.gif" /> (6.6)</td>
        <td align="center"><span class="percent2">97%</span></td>
        <td align="center">181</td>
        <td align="center">Nov. 18, 2007, 3:39PM</td>
        <td align="center"><span class="gray">N/A</span>
    </tr>
      <tr height="25">
        <td>
          <table cellspacing="0" cellpadding="0" align="center">
            <tr><td align="center"><a href="javascript:openWindow('view_scores.php?user=22553&song=1460')"><span style="font-size: 0.7em">View All</span></a></td></tr>
          </table>
        </td>
        <td align="center">2</td><td align="center">Story of My Life</td><td align="center"><a href="rankings.php?game=5&diff=4&song=1460&page=19&highlight=914" class="rank3">914th</a></td>
        <td align="center">357,933</td>
        <td align="center"><img src="/images/rating_6.gif" /> (6.5)</td>
        <td align="center"><span class="percent2">99%</span></td>
        <td align="center">211</td>
        <td align="center">Nov. 10, 2007, 4:44PM</td>
        <td align="center"><span class="gray">N/A</span>
    </tr>
      <tr height="25">
        <td>
          <table cellspacing="0" cellpadding="0" align="center">
            <tr><td align="center"><a href="javascript:openWindow('view_scores.php?user=22553&song=1476')"><span style="font-size: 0.7em">View All</span></a></td></tr>
          </table>
        </td>
        <td align="center">2</td><td align="center">Rock and Roll All Nite</td><td align="center"><a href="rankings.php?game=5&diff=4&song=1476&page=17&highlight=815" class="rank3">815th</a></td>
        <td align="center">162,133</td>
        <td align="center"><img src="/images/rating_6.gif" /> (6.0)</td>
        <td align="center"><span class="percent2">94%</span></td>
        <td align="center">161</td>
        <td align="center">Nov. 8, 2007, 12:11PM</td>
        <td align="center"><span class="gray">N/A</span>
    </tr>
Explained:
There are 5 chunks of code that are similar. This i have already solved with this regEx:

C#

Code: Select all

new Regex(@"\('view_scores.php\?user=.*?&song=.*?'\)"">[\S\s]*?<td align=""center"">.*?</td>[\S\s]*?<td align=""center"">(.*?)</td>[\S\s]*?<td align=""center"">(.*?)</td>[\S\s]*?<td align=""center"">(.*?)</td>[\S\s]*?<td align=""center"">(.*?)</td>[\S\s]*?<td align=""center"">(.*?)</td>[\S\s]*?<td align=""center"">(.*?)</td>", RegexOptions.Multiline);
This will give me for example, these values:[/syntax]

Code: Select all

Rock and Roll All Nite
815th
162,133
<img src="/images/rating_6.gif" /> (6.0)
<span class="percent2">94%</span>
161
-------------------------------------------------------------------------------------------------------
Now to my problem.
I also want to grab Starting Out Small from this: (first piece of code from the HTML above)

Code: Select all

      <tr height="30">
        <td colspan="3" class="tier1" align="center" style="border-right: none 1px #000;">Starting Out Small (<a class="tierlink" href="rankings.php?game=5&diff=4&song=-36&page=16&highlight=778">Rank: 778th</a>) </td>
        <td colspan="7" class="tier2" style="border-left: none 1px #000;">&nbsp;</td>
      </tr>
and i want to join it with my current regEx pattern. (This is the hard part) :)

I dont think there is another way, BECAUSE there are NOT always 5 chunks of code between each of these "headers".
So i cant really grab these headers separately.
-------------------------------------------------------------------------------------------------------

If you dont understand, ill try to explain better :)

PS, this is of course not my site, so i cant change to code or anything.


feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Why does it all need to be done by one pattern...?

Also, since you are dealing with an HTML document, I'd find away to simplify the data you are retrieving, or use the DOM.

By simplify the data you are getting, I mean something like this:

Code: Select all

$htmlData = preg_replace('/<(table|tr|td).*>/', '<\\1>', $htmlData);
That would make it more manageable to read, and then you might also be able to find a better way of simplifying the data with regex down to a point where you could basically have what you want without needing to extract it.
Frexuz
Forum Newbie
Posts: 5
Joined: Mon Oct 22, 2007 8:55 am

Post by Frexuz »

The reason why i want them as one is that I want my matchCollection to look something like this:

Group[0] (the header, only 1 value)
Starting Out Small

Group[1] (song-item, 6 values)
{Slow Ride , 892nd , 192,601 , 6.7 , 99% , 267}

Group[2] (song-item, 6 values)
{Talk Dirty to Me , 709th , 298,732 , 6.7 , 99% , 358}

Group[3] (song-item, 6 values)
{Hit Me With Your Best Shot , 868th , 166,643 , 6.6 , 97% , 181}

Group[4] (song-item, 6 values)
{Story of My Life , 914th , 357,933 , 6.5 , 99% , 211}

Group[5] (song-item, 6 values)
{Rock and Roll All Nite , 815th , 162,133 , 6.0 , 94% , 161}

........... then it continues with a new Header, and then a couple of song-items
the problem is, if i grab the headers separately, i dont know what position they have, or to say: how many songs they are holding

basically i want to reproduce this page:
http://www.scorehero.com/scores.php?use ... e=5&diff=4

and i cannot have static values for the headers, cause the lists are changing from user to user, and from game to game..

maybe this will clear it up a little?
I'll try the DOM, but right now i dont understand it :)
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Just create an array of headers and an array of sections, and then print them out according to their index, which should correspond to one another.
Frexuz
Forum Newbie
Posts: 5
Joined: Mon Oct 22, 2007 8:55 am

Post by Frexuz »

But I have no idea when a group of songs ends or starts?
Im just matching each song right now
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Well, you do know that you can match more than one pattern with an pipe, right?

Code: Select all

/(pattern1|pattern2)/
I assumed you'd want some sort of organization and hierarchy to your data, but if it's just the data that you want and that's it, then get it.
Frexuz
Forum Newbie
Posts: 5
Joined: Mon Oct 22, 2007 8:55 am

Post by Frexuz »

Code: Select all

/(pattern1|pattern2)/
Thank you, that works nicely!
Post Reply