Twitter-style Hashtags
Posted: Thu Oct 22, 2009 10:45 pm
Hello,
I'm trying to parse some text for something similar to Hashtags like Twitter has. I think the best way to illustrate this is with an example. I'm using this code to test, but it's really the pattern that's what's important.
This code currently produces this output:
I want it to produce this output:
The part that's not working is the attempt to match unicode characters and hyphens after a hash:
I've also tried doing it as a character class with the same result:
Any assistance is greatly appreciated.
I'm trying to parse some text for something similar to Hashtags like Twitter has. I think the best way to illustrate this is with an example. I'm using this code to test, but it's really the pattern that's what's important.
Code: Select all
<pre><?php
//An array of example text.
$a = array(
'#XXX', //A hashtag by itself.
'llll #XXX', //A hashtag at the end of a string.
'#XXX llll', //A hashtag at the beginning of a string.
'#XXX#OOOO', //A hashtag with another directly appended.
'#XX-XX', //A hashtag with hyphen(s) in it.
'#gäb', //A hashtag with unicode character(s).
'llll [#XXX OOOO] llll', //A hashtag with spaces (but not linebreaks) in it, surrounded by square brackets.
);
//For each of the example texts, test to see if the desired output is produced.
foreach ($a as $b) {
$pattern = '%(\A#(\w|(\p{L}\p{M})|-)+\b)|((?<=\s)#(\w|(\p{L}\p{M})|-)+\b)|((?<=\[)#.+(?=\]))%Uu';
preg_match_all($pattern, ($b), $matches);
echo "$b: ". implode(', ', $matches[0]) ."\n";
}
?></pre>Code: Select all
#XXX: #XXX
llll #XXX: #XXX
#XXX llll: #XXX
#XXX#OOOO: #XXX
#XX-XX: #XX
#gäb: #g
llll [#XXX OOOO] llll: #XXX OOOOCode: Select all
#XXX: #XXX
llll #XXX: #XXX
#XXX llll: #XXX
#XXX#OOOO: #XXX
[color=#0000FF]#XX-XX: #XX-XX[/color]
[color=#0000FF]#gäb: #gäb[/color]
llll [#XXX OOOO] llll: #XXX OOOOCode: Select all
(\w|(\p{L}\p{M})|-)Code: Select all
[\w(\p{L}\p{M})-]