I'm trying to parse some text for something similar to Hashtags like Twitter has. I think the best way to illustrate this is with an example. I'm using this code to test, but it's really the pattern that's what's important.
Code: Select all
<pre><?php
//An array of example text.
$a = array(
'#XXX', //A hashtag by itself.
'llll #XXX', //A hashtag at the end of a string.
'#XXX llll', //A hashtag at the beginning of a string.
'#XXX#OOOO', //A hashtag with another directly appended.
'#XX-XX', //A hashtag with hyphen(s) in it.
'#gäb', //A hashtag with unicode character(s).
'llll [#XXX OOOO] llll', //A hashtag with spaces (but not linebreaks) in it, surrounded by square brackets.
);
//For each of the example texts, test to see if the desired output is produced.
foreach ($a as $b) {
$pattern = '%(\A#(\w|(\p{L}\p{M})|-)+\b)|((?<=\s)#(\w|(\p{L}\p{M})|-)+\b)|((?<=\[)#.+(?=\]))%Uu';
preg_match_all($pattern, ($b), $matches);
echo "$b: ". implode(', ', $matches[0]) ."\n";
}
?></pre>Code: Select all
#XXX: #XXX
llll #XXX: #XXX
#XXX llll: #XXX
#XXX#OOOO: #XXX
#XX-XX: #XX
#gäb: #g
llll [#XXX OOOO] llll: #XXX OOOOCode: Select all
#XXX: #XXX
llll #XXX: #XXX
#XXX llll: #XXX
#XXX#OOOO: #XXX
[color=#0000FF]#XX-XX: #XX-XX[/color]
[color=#0000FF]#gäb: #gäb[/color]
llll [#XXX OOOO] llll: #XXX OOOOCode: Select all
(\w|(\p{L}\p{M})|-)Code: Select all
[\w(\p{L}\p{M})-]