preg_match[_all]($regex) question
Moderator: General Moderators
preg_match[_all]($regex) question
Hey guys, I'm pretty new to regex and could use some help.
I've tried to find some examples of what I can use before coming here to ask, but they all seem incredibly too verbose for me to try and grasp.
In any event, here's my issue:
I am trying to strip some data from a pretty long data string in the form of (it's in another language so I'll just include random english):
KASI=<p align="center"><b>Hi there / What's up</b><br><br>kakusi: this / sakkyoku: that<br><br>lots of data
Ideally, I want to grab that data, strip out the "Hi there" , the "What's up" , the " this " , the " that " , and "lots of data" which terminates at the end with no line break.
I was trying to use a preg_match with regex to strip it. I imagine preg_match_all may be more suitable?
In any event, I'd like some help coming up with the regex to help me isolate things from a string like that.
Thanks!
code snipit:
$page = file_get_contents($url);
$pattern = regex here;
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
I've tried to find some examples of what I can use before coming here to ask, but they all seem incredibly too verbose for me to try and grasp.
In any event, here's my issue:
I am trying to strip some data from a pretty long data string in the form of (it's in another language so I'll just include random english):
KASI=<p align="center"><b>Hi there / What's up</b><br><br>kakusi: this / sakkyoku: that<br><br>lots of data
Ideally, I want to grab that data, strip out the "Hi there" , the "What's up" , the " this " , the " that " , and "lots of data" which terminates at the end with no line break.
I was trying to use a preg_match with regex to strip it. I imagine preg_match_all may be more suitable?
In any event, I'd like some help coming up with the regex to help me isolate things from a string like that.
Thanks!
code snipit:
$page = file_get_contents($url);
$pattern = regex here;
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: preg_match[_all]($regex) question
You need to be more precise about your requirements before being able to construct a regex (or get help constructing one). What happened to the word "sakkyoku"? Why did you leave it out?
Re: preg_match[_all]($regex) question
Sakkyoku is more like a tag.
Ideally the regex would be search along the lines of:
Starts with : KASI=<p align="center"><b> , ends with /
So (*) in KASI=<p align="center"><b>(*) / would be selected
then starts with / and ends with </b><br><br>
so (*) in / (*)</b><br><br> would be selected
kakusi: (*) /
sakkyoku: (*) /
<br><br>(*) [end]
would be selected
Most of the data in (*) won't be ANSI, if that matters. It will be UTF-8 chars as it's a foreign language.
Does that help at all?
Ideally the regex would be search along the lines of:
Starts with : KASI=<p align="center"><b> , ends with /
So (*) in KASI=<p align="center"><b>(*) / would be selected
then starts with / and ends with </b><br><br>
so (*) in / (*)</b><br><br> would be selected
kakusi: (*) /
sakkyoku: (*) /
<br><br>(*) [end]
would be selected
Most of the data in (*) won't be ANSI, if that matters. It will be UTF-8 chars as it's a foreign language.
Does that help at all?
Re: preg_match[_all]($regex) question
or is regex not the best way to go about doing that?
Re: preg_match[_all]($regex) question
Code: Select all
preg_match_all('@KASI=<p align="center"><b>([^/]+)/@', $page, $matches)I have no idea how regex handles foreign characters (hiragana/kanji).
I don't know if your example is supposed to be one continuous string or not, if it is you would do something like this..
Code: Select all
$pattern = '@KASI=<p align="center"><b>([^/]+)/((?:[^<]|<(?!/b><br><br>))+)</b><br><br>(.*)@';
preg_match( $pattern, $page, $match )Re: preg_match[_all]($regex) question
Thanks for the input. I tried to run what you said with the following code:
which returns
An exact example with unicode of what I'm trying to parse is:
Thanks in advance!
Code: Select all
<?php
$page = file_get_contents('$url');
preg_match_all('@LYRICS=<p align="center"><b>([^/]+)/@', $page, $matches);
var_dump(matches);
echo $matches;
?>
Maybe this isn't working right?string(7) "matches" Array
An exact example with unicode of what I'm trying to parse is:
Where blue marks the spots I want to strip.LYRICS=<p align="center"><b>????/????</b><br><br>???????/???????<br><br>??????????<br>???????<br>??????
Thanks in advance!
Re: preg_match[_all]($regex) question
Oh,
I tried running a print_r($matches);, without the var_dump, which returned a:
Thanks a lot. I just need to grab the other info now 
I tried running a print_r($matches);, without the var_dump, which returned a:
So I guess that workedArray
(
[0] => Array
(
[0] => LYRICS=<p align="center"><b>ふたり /
)
[1] => Array
(
[0] => ふたり
)
)
Re: preg_match[_all]($regex) question
If you know it's only going to be on the page once, use the preg_match function, since it stops after the first match.
If you need to match the same pattern multiple times on the page use preg_match_all.
I've been trying to learn how to read Japanese online (basically started last week), and katakana and hiragana are pretty simple, but kanji are really confusing for me -.-
If you need to match the same pattern multiple times on the page use preg_match_all.
I've been trying to learn how to read Japanese online (basically started last week), and katakana and hiragana are pretty simple, but kanji are really confusing for me -.-
Re: preg_match[_all]($regex) question
If you need any help with Japanese let me know 
Appriciate the regex help. I'm so lost with it
New to scripting.
Appriciate the regex help. I'm so lost with it