REGEX prob for catching three ways of input

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

I suspect there are a few more exceptions you haven't mentioned. So, before I advise anything else, let me ask you again: could you give a detailed explanation of what a valid- and what an invalid string would be. Also, please give a good number of example input strings that should be accepted and a good deal that should be rejected.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

DavidTheSlayer wrote:Yes sorry I realised that long afterwards.

sometext, sometext is also allowed along with...
somenumber.somenumber sometext, sometext (also applies vice-versa).

So....

Valid...

sometext
sometext,sometext
sometext, sometext
somenumber.somenumber, somenumber

Invalid...

sometext, , sometext
sometext, , , ,
sometext,
sometext, sometext
1.5 sometext, , sometext
,,
.
Okay.

Please anser the following questions:

Question 1 - a number doesn't have to contain a decimal point. So, "3" is valid? What about numbers that span more than one digit: "333" and what about "0.1234"?

Question 2 - so a acceptable string can start with a NUMBER or a WORD token?

Question 3 - a NUMBER token can be followed by a NUMBER token? Example: "1.2, 3.4" is valid?

Question 4 - two tokens (either a NUMBER or a WORD) are separated by one or more white spaces or only one comma surrounding one or more white spaces? Example "aaa , bbbb" and "1.1 cccc" are both valid?

Question 5 - I see "sometext, sometext" is in your Invalid-list. Why?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

I think the case in question 5 is a mistake on your part. If that is correct, it looks like the following regex will do:

Code: Select all

<?php
$tests = array(
    'sometext',
    'sometext,sometext',
    'sometext, sometext',
    '7.7, 7',
    'sometext, , sometext',
    'sometext, , , ,',
    'sometext,',
    'sometext, sometext',
    '1.5 sometext, , sometext',
    ',,'
);
$token = "([a-z]+|(\d\.)?\d)";
foreach($tests as $t) {
  if(preg_match("/^$token((\s*,\s*|\s+)$token)*$/i", $t)) {
    echo "Accepted: $t\n";
  } else {
    echo "Rejected: $t\n";
  }
}
?>
DavidTheSlayer
Forum Newbie
Posts: 15
Joined: Fri Aug 22, 2008 2:17 am

Re: REGEX prob for catching three ways of input

Post by DavidTheSlayer »

To answer the previous questions...

1) Thinking about future uses, I need just a number as well as a number followed by another number as a valid example. E.g. 5 or 5.6 are both valid.

2) Correct. E.g. 3, 1.1, sometext

3) Correct

4) Partially correct, "aaa , bbbb" = invalid (extra space) and "1.1 cccc" = valid. A word or number no matter what order cannot have more than one whitespace between them, they have to be comma delimited and then separated by an optional space. E.g. "sometext, sometext" or sometext,sometext or 1.1,sometext, sometext, 131.82

5) Yes that's correct where "sometext, sometext" in the invalid list is simply in the wrong place...(copy and paste job :roll: ). Upon extending your REGEX this is a valid input.

I'll test the new code out on Tuesday as its bank holiday Monday over here in the UK and I'll do a quick test over the weekend at home as I don't have access from outside the network. I'll post back on Tuesday.

Will your regex catch Unicode characters (UTF-8)? Otherwise I'll add a "u" and try swapping the "\s" for "\t" if possible.
The end result will be a public, restricted (hence REGEX) small tagging ability for a company I work at where the tags are stored in a MySQL DB.

Thank you very much for your time and patience again.... :bow: :bow:
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

DavidTheSlayer wrote:...
4) Partially correct, "aaa , bbbb" = invalid (extra space) and "1.1 cccc" = valid. A word or number no matter what order cannot have more than one whitespace between them, they have to be comma delimited and then separated by an optional space. E.g. "sometext, sometext" or sometext,sometext or 1.1,sometext, sometext, 131.82
...
The examples from your original post indicate that the comma is also optional.
If the comma is mandatory, there's only a small change to be made to the regex I last posted.

Good luck!
DavidTheSlayer
Forum Newbie
Posts: 15
Joined: Fri Aug 22, 2008 2:17 am

Re: REGEX prob for catching three ways of input

Post by DavidTheSlayer »

Yes the comma is optional only for one word. e.g. sometext or 1.1 otherwise it has to be in there for the explode to work.

Can you give me a quick walk through on the bottom regex with the top $token in it? as I can't quite understand what its doing.

The top one from what I gather says if there's one or more characters ranging from a-z OR a (number followed by an dot (optional)) followed by another number.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

DavidTheSlayer wrote:Yes the comma is optional only for one word. e.g. sometext or 1.1 otherwise it has to be in there for the explode to work.

Can you give me a quick walk through on the bottom regex with the top $token in it? as I can't quite understand what its doing.

The top one from what I gather says if there's one or more characters ranging from a-z OR a (number followed by an dot (optional)) followed by another number.
Taking the last constraints into account, this probably is the regex:

Code: Select all

<?php
$tests = array(
    'sometext',
    'sometext,sometext',
    'sometext, sometext',
    '7.7, 7',
    'sometext, , sometext',
    'sometext, , , ,',
    'sometext,',
    'sometext, sometext',
    '1.5 sometext, , sometext',
    ',,'
);
$token = "([a-zA-Z]+|(\d\.)?\d)";
foreach($tests as $t) {
  if(preg_match("/^$token((,\s?|\s)$token)*$/", $t)) {
    echo "Accepted: $t\n";
  } else {
    echo "Rejected: $t\n";
  }
}
 
/* output:
            Accepted: sometext
            Accepted: sometext,sometext
            Accepted: sometext, sometext
            Accepted: 7.7, 7
            Rejected: sometext, , sometext
            Rejected: sometext, , , ,
            Rejected: sometext,
            Accepted: sometext, sometext
            Rejected: 1.5 sometext, , sometext
            Rejected: ,,
*/
?>
And a (short) explanation of the regex involved:

Code: Select all

TOKEN = (
  [a-zA-Z]+    // one ore more letters
  |            // OR
  (\d\.)?\d    // a number followed by a dot (which are both optional) followed by a number
)
And now the regex "/^$token((,\s?|\s)$token)*$/" really means:

Code: Select all

^              // the start of the input string      
$token         // followed by a TOKEN
(              // ( open group 1
  (            //   ( open group 2
    ,\s?       //     followed by a comma with an optional space
    |          //     OR
    \s         //     one space
  )            //   ) close group 2
  $token       //   followed by a TOKEN
)              // ) close group 1
*              // group one can occur zero or more times
$              // followed by the end of the input string
DavidTheSlayer
Forum Newbie
Posts: 15
Joined: Fri Aug 22, 2008 2:17 am

Re: REGEX prob for catching three ways of input

Post by DavidTheSlayer »

Got it working & it works very well, thank you very much. Much simpler too! :bow:
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: REGEX prob for catching three ways of input

Post by prometheuzz »

DavidTheSlayer wrote:Got it working & it works very well, thank you very much. Much simpler too! :bow:
Good to hear it, and you're welcome!
Post Reply