REGEX prob for catching three ways of input
Moderator: General Moderators
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
I suspect there are a few more exceptions you haven't mentioned. So, before I advise anything else, let me ask you again: could you give a detailed explanation of what a valid- and what an invalid string would be. Also, please give a good number of example input strings that should be accepted and a good deal that should be rejected.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
Okay.DavidTheSlayer wrote:Yes sorry I realised that long afterwards.
sometext, sometext is also allowed along with...
somenumber.somenumber sometext, sometext (also applies vice-versa).
So....
Valid...
sometext
sometext,sometext
sometext, sometext
somenumber.somenumber, somenumber
Invalid...
sometext, , sometext
sometext, , , ,
sometext,
sometext, sometext
1.5 sometext, , sometext
,,
.
Please anser the following questions:
Question 1 - a number doesn't have to contain a decimal point. So, "3" is valid? What about numbers that span more than one digit: "333" and what about "0.1234"?
Question 2 - so a acceptable string can start with a NUMBER or a WORD token?
Question 3 - a NUMBER token can be followed by a NUMBER token? Example: "1.2, 3.4" is valid?
Question 4 - two tokens (either a NUMBER or a WORD) are separated by one or more white spaces or only one comma surrounding one or more white spaces? Example "aaa , bbbb" and "1.1 cccc" are both valid?
Question 5 - I see "sometext, sometext" is in your Invalid-list. Why?
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
I think the case in question 5 is a mistake on your part. If that is correct, it looks like the following regex will do:
Code: Select all
<?php
$tests = array(
'sometext',
'sometext,sometext',
'sometext, sometext',
'7.7, 7',
'sometext, , sometext',
'sometext, , , ,',
'sometext,',
'sometext, sometext',
'1.5 sometext, , sometext',
',,'
);
$token = "([a-z]+|(\d\.)?\d)";
foreach($tests as $t) {
if(preg_match("/^$token((\s*,\s*|\s+)$token)*$/i", $t)) {
echo "Accepted: $t\n";
} else {
echo "Rejected: $t\n";
}
}
?>-
DavidTheSlayer
- Forum Newbie
- Posts: 15
- Joined: Fri Aug 22, 2008 2:17 am
Re: REGEX prob for catching three ways of input
To answer the previous questions...
1) Thinking about future uses, I need just a number as well as a number followed by another number as a valid example. E.g. 5 or 5.6 are both valid.
2) Correct. E.g. 3, 1.1, sometext
3) Correct
4) Partially correct, "aaa , bbbb" = invalid (extra space) and "1.1 cccc" = valid. A word or number no matter what order cannot have more than one whitespace between them, they have to be comma delimited and then separated by an optional space. E.g. "sometext, sometext" or sometext,sometext or 1.1,sometext, sometext, 131.82
5) Yes that's correct where "sometext, sometext" in the invalid list is simply in the wrong place...(copy and paste job
). Upon extending your REGEX this is a valid input.
I'll test the new code out on Tuesday as its bank holiday Monday over here in the UK and I'll do a quick test over the weekend at home as I don't have access from outside the network. I'll post back on Tuesday.
Will your regex catch Unicode characters (UTF-8)? Otherwise I'll add a "u" and try swapping the "\s" for "\t" if possible.
The end result will be a public, restricted (hence REGEX) small tagging ability for a company I work at where the tags are stored in a MySQL DB.
Thank you very much for your time and patience again....

1) Thinking about future uses, I need just a number as well as a number followed by another number as a valid example. E.g. 5 or 5.6 are both valid.
2) Correct. E.g. 3, 1.1, sometext
3) Correct
4) Partially correct, "aaa , bbbb" = invalid (extra space) and "1.1 cccc" = valid. A word or number no matter what order cannot have more than one whitespace between them, they have to be comma delimited and then separated by an optional space. E.g. "sometext, sometext" or sometext,sometext or 1.1,sometext, sometext, 131.82
5) Yes that's correct where "sometext, sometext" in the invalid list is simply in the wrong place...(copy and paste job
I'll test the new code out on Tuesday as its bank holiday Monday over here in the UK and I'll do a quick test over the weekend at home as I don't have access from outside the network. I'll post back on Tuesday.
Will your regex catch Unicode characters (UTF-8)? Otherwise I'll add a "u" and try swapping the "\s" for "\t" if possible.
The end result will be a public, restricted (hence REGEX) small tagging ability for a company I work at where the tags are stored in a MySQL DB.
Thank you very much for your time and patience again....
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
The examples from your original post indicate that the comma is also optional.DavidTheSlayer wrote:...
4) Partially correct, "aaa , bbbb" = invalid (extra space) and "1.1 cccc" = valid. A word or number no matter what order cannot have more than one whitespace between them, they have to be comma delimited and then separated by an optional space. E.g. "sometext, sometext" or sometext,sometext or 1.1,sometext, sometext, 131.82
...
If the comma is mandatory, there's only a small change to be made to the regex I last posted.
Good luck!
-
DavidTheSlayer
- Forum Newbie
- Posts: 15
- Joined: Fri Aug 22, 2008 2:17 am
Re: REGEX prob for catching three ways of input
Yes the comma is optional only for one word. e.g. sometext or 1.1 otherwise it has to be in there for the explode to work.
Can you give me a quick walk through on the bottom regex with the top $token in it? as I can't quite understand what its doing.
The top one from what I gather says if there's one or more characters ranging from a-z OR a (number followed by an dot (optional)) followed by another number.
Can you give me a quick walk through on the bottom regex with the top $token in it? as I can't quite understand what its doing.
The top one from what I gather says if there's one or more characters ranging from a-z OR a (number followed by an dot (optional)) followed by another number.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
Taking the last constraints into account, this probably is the regex:DavidTheSlayer wrote:Yes the comma is optional only for one word. e.g. sometext or 1.1 otherwise it has to be in there for the explode to work.
Can you give me a quick walk through on the bottom regex with the top $token in it? as I can't quite understand what its doing.
The top one from what I gather says if there's one or more characters ranging from a-z OR a (number followed by an dot (optional)) followed by another number.
Code: Select all
<?php
$tests = array(
'sometext',
'sometext,sometext',
'sometext, sometext',
'7.7, 7',
'sometext, , sometext',
'sometext, , , ,',
'sometext,',
'sometext, sometext',
'1.5 sometext, , sometext',
',,'
);
$token = "([a-zA-Z]+|(\d\.)?\d)";
foreach($tests as $t) {
if(preg_match("/^$token((,\s?|\s)$token)*$/", $t)) {
echo "Accepted: $t\n";
} else {
echo "Rejected: $t\n";
}
}
/* output:
Accepted: sometext
Accepted: sometext,sometext
Accepted: sometext, sometext
Accepted: 7.7, 7
Rejected: sometext, , sometext
Rejected: sometext, , , ,
Rejected: sometext,
Accepted: sometext, sometext
Rejected: 1.5 sometext, , sometext
Rejected: ,,
*/
?>Code: Select all
TOKEN = (
[a-zA-Z]+ // one ore more letters
| // OR
(\d\.)?\d // a number followed by a dot (which are both optional) followed by a number
)Code: Select all
^ // the start of the input string
$token // followed by a TOKEN
( // ( open group 1
( // ( open group 2
,\s? // followed by a comma with an optional space
| // OR
\s // one space
) // ) close group 2
$token // followed by a TOKEN
) // ) close group 1
* // group one can occur zero or more times
$ // followed by the end of the input string-
DavidTheSlayer
- Forum Newbie
- Posts: 15
- Joined: Fri Aug 22, 2008 2:17 am
Re: REGEX prob for catching three ways of input
Got it working & it works very well, thank you very much. Much simpler too! 
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: REGEX prob for catching three ways of input
Good to hear it, and you're welcome!DavidTheSlayer wrote:Got it working & it works very well, thank you very much. Much simpler too!