unicode unprintable character for regex
Posted: Fri Jul 06, 2007 11:11 am
Hi,
I need to get the norwegian link for interwiki in wikipedia.
So I made a pattern to use with preg_match().
At begining I put this :
$no_pattern='/<li class="interwiki-no"><a href="(.*)">Norsk \(b/u';
But it doesn't work, there is an unprintable utf8 character between ">" and "Norsk" in wikipedia
The hex code are E2 80 AA, so there is 3 hex caracters this correspond to Left-to-Right Embedding, U+202A.
I don't know how to complete my pattern to make it work properly.
Thanks.

I need to get the norwegian link for interwiki in wikipedia.
So I made a pattern to use with preg_match().
At begining I put this :
$no_pattern='/<li class="interwiki-no"><a href="(.*)">Norsk \(b/u';
But it doesn't work, there is an unprintable utf8 character between ">" and "Norsk" in wikipedia
The hex code are E2 80 AA, so there is 3 hex caracters this correspond to Left-to-Right Embedding, U+202A.
I don't know how to complete my pattern to make it work properly.
Thanks.