preg_match[_all]($regex) question

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

preg_match[_all]($regex) question

Post by tr3online »

Hey guys, I'm pretty new to regex and could use some help.
I've tried to find some examples of what I can use before coming here to ask, but they all seem incredibly too verbose for me to try and grasp.
In any event, here's my issue:

I am trying to strip some data from a pretty long data string in the form of (it's in another language so I'll just include random english):

KASI=<p align="center"><b>Hi there / What's up</b><br><br>kakusi: this / sakkyoku: that<br><br>lots of data

Ideally, I want to grab that data, strip out the "Hi there" , the "What's up" , the " this " , the " that " , and "lots of data" which terminates at the end with no line break.
I was trying to use a preg_match with regex to strip it. I imagine preg_match_all may be more suitable?

In any event, I'd like some help coming up with the regex to help me isolate things from a string like that.

Thanks!

code snipit:

$page = file_get_contents($url);
$pattern = regex here;
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_match[_all]($regex) question

Post by prometheuzz »

You need to be more precise about your requirements before being able to construct a regex (or get help constructing one). What happened to the word "sakkyoku"? Why did you leave it out?
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

Re: preg_match[_all]($regex) question

Post by tr3online »

Sakkyoku is more like a tag.
Ideally the regex would be search along the lines of:

Starts with : KASI=<p align="center"><b> , ends with /
So (*) in KASI=<p align="center"><b>(*) / would be selected

then starts with / and ends with </b><br><br>
so (*) in / (*)</b><br><br> would be selected

kakusi: (*) /
sakkyoku: (*) /
<br><br>(*) [end]
would be selected

Most of the data in (*) won't be ANSI, if that matters. It will be UTF-8 chars as it's a foreign language.

Does that help at all?
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

Re: preg_match[_all]($regex) question

Post by tr3online »

or is regex not the best way to go about doing that?
semlar
Forum Commoner
Posts: 61
Joined: Fri Feb 20, 2009 10:45 pm

Re: preg_match[_all]($regex) question

Post by semlar »

Code: Select all

preg_match_all('@KASI=<p align="center"><b>([^/]+)/@', $page, $matches)
Matches all occurrences of [KASI=<p align="center"><b>] followed by stuff and then [/] in $page, returns array as $matches.

I have no idea how regex handles foreign characters (hiragana/kanji).

I don't know if your example is supposed to be one continuous string or not, if it is you would do something like this..

Code: Select all

$pattern = '@KASI=<p align="center"><b>([^/]+)/((?:[^<]|<(?!/b><br><br>))+)</b><br><br>(.*)@';
preg_match( $pattern, $page, $match )
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

Re: preg_match[_all]($regex) question

Post by tr3online »

Thanks for the input. I tried to run what you said with the following code:

Code: Select all

 
<?php
$page = file_get_contents('$url');
preg_match_all('@LYRICS=<p align="center"><b>([^/]+)/@', $page, $matches);
var_dump(matches);
echo $matches;
?>
 
which returns
string(7) "matches" Array
Maybe this isn't working right?

An exact example with unicode of what I'm trying to parse is:
LYRICS=<p align="center"><b>????/????</b><br><br>???????/???????<br><br>??????????<br>???????<br>??????
Where blue marks the spots I want to strip.

Thanks in advance!
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

Re: preg_match[_all]($regex) question

Post by tr3online »

Oh,
I tried running a print_r($matches);, without the var_dump, which returned a:
Array
(
[0] => Array
(
[0] => LYRICS=<p align="center"><b>ふたり /
)

[1] => Array
(
[0] => ふたり 
)

)
So I guess that worked ;) Thanks a lot. I just need to grab the other info now :)
semlar
Forum Commoner
Posts: 61
Joined: Fri Feb 20, 2009 10:45 pm

Re: preg_match[_all]($regex) question

Post by semlar »

If you know it's only going to be on the page once, use the preg_match function, since it stops after the first match.

If you need to match the same pattern multiple times on the page use preg_match_all.

I've been trying to learn how to read Japanese online (basically started last week), and katakana and hiragana are pretty simple, but kanji are really confusing for me -.-
tr3online
Forum Newbie
Posts: 16
Joined: Sat Jan 24, 2009 8:02 pm

Re: preg_match[_all]($regex) question

Post by tr3online »

If you need any help with Japanese let me know ;)

Appriciate the regex help. I'm so lost with it :| New to scripting.
Post Reply