Page 1 of 1

regluar expression help

Posted: Mon Aug 02, 2004 5:49 pm
by Swede78
I'm try to pull out the song title and artist from billboard top 100 type of list. Instead of going through the songs manually, I'm trying to strip the junk with php. I've been messing with preg_replace for some time now and have only had partial success.

Here's the format of the text I'm grabbing:

8 3 22 Burn, Usher
LaFace | ALBUM CUT | Zomba 1
9 7 21 The Reason, Hoobastank
Island | ALBUM CUT | IDJMG 2

The "odd" lines have 3 sets of numbers, then the title, then artist. I've been able to get this part stripped of the junk with preg_replace with this code:

Code: Select all

$SongList = explode("\n", $SongList);
$SongList = preg_replace('/^[0-9]* [0-9]* [0-9]* /', '', $SongList);
This is working, but haven't tested thoroughly. My problem is getting the "even" lines out. Don't need that information altogether. I know I can just go through the the array I've created and remove them that way. But, I think a replacement statement would be quicker (and I'd like to learn more of this). The even lines have 3 sets of characters separated with the "|" character.

Here's my non-working code:

Code: Select all

$SongList = preg_replace('/^[\w]* | [\w]* | [\w]*\n/', '', $SongList);
Any help is appreciated. Any links to good sample/tutorial sites would be nice as well. I wasn't able to view the samples on php.net, but will try the site again later.

Posted: Mon Aug 02, 2004 6:03 pm
by Swede78
Actually, removing the "even" line won't work with a regular expression replacement function, because that code will also affect the "odd" lines. This would ruin that code too. So, I'm just going to loop through the array and only use the "odd" lines.

But, I'd still appreciate any tips or suggestions.

Posted: Mon Aug 02, 2004 6:31 pm
by Buddha443556
Do them both at the same time.

Code: Select all

s/(\d+) (\d+) (\d+) (ї^,]+), (ї^\r\n]+)\s+(ї^\|]+)\|(ї^\|]+)\|(ї^\r\n]+)\s+/\1 \2 \3 \4, \5\n/gis
This works in Perl as for PHP? I'll let you play with it.

EDIT to add the PHP...

Code: Select all

<?php
SongList = explode("\n", $SongList);
$SongList = preg_replace("/(\d+) (\d+) (\d+) ([^,]+), ([^\n]+)\s+([^\|]+)\|([^\|]+)\|([^\n]+)\s+/", "\\1 \\2 \\3 \\4, \\5\n", $SongList);
?>
Think that should work. And sorry for not supplying the PHP up front - customer called.

Posted: Tue Aug 03, 2004 10:52 am
by Swede78
Thanks Buddha,

It didn't work as is. From what I've read, preg_replace is supposed to be PERL based. I haven't tried messing with it yet, so it may just need some minor tweeking. Unfortunately, I don't understand much of what's there, but I'll continue trying.

BTW, I'm on a Windows based host w/ PHP 4.3. Not sure if that makes a difference.

I appreciate the help!

Posted: Wed Aug 04, 2004 3:04 pm
by Buddha443556
Most likely it \n and \s you'll need to tweek. Might try switching \s+ to \s* or just removing the \s+ class altogether. If that doesn't work try changing \n to \r\n - it's hard to tell what your end of line is even if your using Windows.

When I copied your sample data from the post I actually got unix text which is why you see the \r\n in the orginal Perl expression. Whitespace can really be a pain in the butt when using regular expression.

There's a great little summary of Perl Regular Expressions here. Once you learn to use regular expressions you'll wonder how you got along so long without them.

Posted: Wed Aug 04, 2004 4:16 pm
by Swede78
I ended up using the working code I had and only using every other line when looping through the array - it works for now :)

But, I took a look at the site you provided a link for - very nice!

Thanks alot, Buddha.