Page 1 of 1
preg_match[_all]($regex) question
Posted: Wed Apr 01, 2009 8:21 pm
by tr3online
Hey guys, I'm pretty new to regex and could use some help.
I've tried to find some examples of what I can use before coming here to ask, but they all seem incredibly too verbose for me to try and grasp.
In any event, here's my issue:
I am trying to strip some data from a pretty long data string in the form of (it's in another language so I'll just include random english):
KASI=<p align="center"><b>Hi there / What's up</b><br><br>kakusi: this / sakkyoku: that<br><br>lots of data
Ideally, I want to grab that data, strip out the "Hi there" , the "What's up" , the " this " , the " that " , and "lots of data" which terminates at the end with no line break.
I was trying to use a preg_match with regex to strip it. I imagine preg_match_all may be more suitable?
In any event, I'd like some help coming up with the regex to help me isolate things from a string like that.
Thanks!
code snipit:
$page = file_get_contents($url);
$pattern = regex here;
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 3:48 am
by prometheuzz
You need to be more precise about your requirements before being able to construct a regex (or get help constructing one). What happened to the word "sakkyoku"? Why did you leave it out?
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 1:56 pm
by tr3online
Sakkyoku is more like a tag.
Ideally the regex would be search along the lines of:
Starts with : KASI=<p align="center"><b> , ends with /
So (*) in KASI=<p align="center"><b>(*) / would be selected
then starts with / and ends with </b><br><br>
so (*) in / (*)</b><br><br> would be selected
kakusi: (*) /
sakkyoku: (*) /
<br><br>(*) [end]
would be selected
Most of the data in (*) won't be ANSI, if that matters. It will be UTF-8 chars as it's a foreign language.
Does that help at all?
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 1:57 pm
by tr3online
or is regex not the best way to go about doing that?
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 6:49 pm
by semlar
Code: Select all
preg_match_all('@KASI=<p align="center"><b>([^/]+)/@', $page, $matches)
Matches all occurrences of [KASI=<p align="center"><b>] followed by stuff and then [/] in $page, returns array as $matches.
I have no idea how regex handles foreign characters (hiragana/kanji).
I don't know if your example is supposed to be one continuous string or not, if it is you would do something like this..
Code: Select all
$pattern = '@KASI=<p align="center"><b>([^/]+)/((?:[^<]|<(?!/b><br><br>))+)</b><br><br>(.*)@';
preg_match( $pattern, $page, $match )
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 7:05 pm
by tr3online
Thanks for the input. I tried to run what you said with the following code:
Code: Select all
<?php
$page = file_get_contents('$url');
preg_match_all('@LYRICS=<p align="center"><b>([^/]+)/@', $page, $matches);
var_dump(matches);
echo $matches;
?>
which returns
string(7) "matches" Array
Maybe this isn't working right?
An exact example with unicode of what I'm trying to parse is:
LYRICS=<p align="center"><b>????/????</b><br><br>???????/???????<br><br>??????????<br>???????<br>??????
Where blue marks the spots I want to strip.
Thanks in advance!
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 7:11 pm
by tr3online
Oh,
I tried running a
print_r($matches);, without the var_dump, which returned a:
Array
(
[0] => Array
(
[0] => LYRICS=<p align="center"><b>ふたり /
)
[1] => Array
(
[0] => ふたり
)
)
So I guess that worked

Thanks a lot. I just need to grab the other info now

Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 7:14 pm
by semlar
If you know it's only going to be on the page once, use the preg_match function, since it stops after the first match.
If you need to match the same pattern multiple times on the page use preg_match_all.
I've been trying to learn how to read Japanese online (basically started last week), and katakana and hiragana are pretty simple, but kanji are really confusing for me -.-
Re: preg_match[_all]($regex) question
Posted: Thu Apr 02, 2009 7:20 pm
by tr3online
If you need any help with Japanese let me know
Appriciate the regex help. I'm so lost with it

New to scripting.