In regards to this forum:
viewtopic.php?f=50&t=90557
It occured to me that those funcitons would not be very Internationally friendly with the characters [0-9] hardcoded into the expression.
Sure I could use \d (or whatever digit in RE is) but what about + or - and periods or comma's???
Internationlized regex
Moderator: General Moderators
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Internationlized regex
\d probably almost always is the ascii character set [0-9]. I say probably because it depends on how the regex engine is compiled. Also see this thread about this:PCSpectra wrote:...
Sure I could use \d (or whatever digit in RE is) but what about + or - and periods or comma's???
viewtopic.php?f=38&t=90050
So, IMO, if you want to be sure to match more than just [0-9], don't rely on \d, but specify exactly what you want to match (using Unicode-codes).
Re: Internationlized regex
So, don't forget to apply the Unicode modifier: u.
I recommend \p{Nd} (Number decimal). Just using \p{N} (Number) will propably match too many characters. For example, it also includes many dingbats like ❶ ② ➌.
A very practical tool to look up these kind of things is UniView: http://people.w3.org/rishida/scripts/uniview/
I recommend \p{Nd} (Number decimal). Just using \p{N} (Number) will propably match too many characters. For example, it also includes many dingbats like ❶ ② ➌.
A very practical tool to look up these kind of things is UniView: http://people.w3.org/rishida/scripts/uniview/
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
Re: Internationlized regex
That is what I was afraid of...So, IMO, if you want to be sure to match more than just [0-9], don't rely on \d, but specify exactly what you want to match (using Unicode-codes).
CoolGeertDD wrote:So, don't forget to apply the Unicode modifier: u.
I recommend \p{Nd} (Number decimal). Just using \p{N} (Number) will propably match too many characters. For example, it also includes many dingbats like ❶ ② ➌.
A very practical tool to look up these kind of things is UniView: http://people.w3.org/rishida/scripts/uniview/