Regular Expression to check a valid IP

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

devarishi
Forum Contributor
Posts: 101
Joined: Fri Feb 05, 2010 7:15 pm

Regular Expression to check a valid IP

Post by devarishi »

I am posting this question which was asked in the interview I attended today:


Write Regular Expression to check a valid IP Address.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

What was *your* answer?
devarishi
Forum Contributor
Posts: 101
Joined: Fri Feb 05, 2010 7:15 pm

Re: Regular Expression to check a valid IP

Post by devarishi »

A blank line on the answersheet!
ridgerunner wrote:What was *your* answer?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Regular Expression to check a valid IP

Post by prometheuzz »

They probably meant IPv4, because matching IPv6 addresses using regex, is plain madness.

So, IPv4 can be matched using this pattern:

Code: Select all

\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5](\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])){3}
A short break down of the most important part of the pattern:

Code: Select all

\d         # match 0..9
|          # OR
[1-9]\d    # 10..99
|          # OR
1\d\d      # 100..199
|          # OR
2[0-4]\d   # 200..249
|          # OR
25[0-5]    # 250..255
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

prometheuzz wrote:... IPv4 can be matched using this pattern:

Code: Select all

\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5](\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])){3}
...
@prometheuzz - your solution has several problems:
  • It fails to match valid IP octets having leading zeroes e.g. 192.168.001.020
  • The first portion ot the regex (which matches the first octet), has several options separated with the | "or" operator. This subexpression needs to be enclosed in parentheses to make the alternation work. Otherwise the regex as a whole matches any single digit anywhere!
  • The order of the alternatives needs to be re-arranged so that the single digit option is last (and the two digit option second to last). Otherwise when given the IP: 192.168.100.100, your regex fails to grab all the digits on the last octet, and erroneously matches the IP address as: 192.168.100.1 (missing trailing digits!)
This one works a bit better:

Code: Select all

if (preg_match(
    '/# free-spacing mode regex for URI component:  IPv4address
    (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}  # dec-octet "." dec-octet "." dec-octet "."
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)        # dec-octet/x', 
    $contents)) {
    # Successful match
} else {
    # Match attempt failed
}
:)
Last edited by ridgerunner on Mon Aug 09, 2010 10:52 pm, edited 2 times in total.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

Here's one to match an IPv6 address...

Code: Select all

if (preg_match(
    '/# free-spacing mode regex for URI component:  IPv6address
    (?:(?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
       |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
       | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
       | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
       | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
       | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
       | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
       ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}              # ls32 - factored out
         | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}  # from first 7 lines
              (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) )      # of ABNF rule above
    | (?:    (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
    | (?:    (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: )/x', 
    $contents)) {
    # Successful match
} else {
    # Match attempt failed
}
This really does work correctly.
And yes, I must be mad! :)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Regular Expression to check a valid IP

Post by prometheuzz »

@ridgerunner, didn't know address-blocks in octet notation were valid. Thanks for the heads up.
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: Regular Expression to check a valid IP

Post by MichaelR »

prometheuzz wrote:They probably meant IPv4, because matching IPv6 addresses using regex, is plain madness.
It's actually not that complicated:

Code: Select all


// IPv4

'/^(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}$/i'

// IPv6

'/^(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9](?::|$)){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?))$/i'

// IPv4-mapped IPv6

'/^(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?))(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}$/i'

// IPv4, IPv6, or IPv4-mapped IPv6

'/^(?:(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9](?::|$)){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))$/i'

Okay, it might be a little complicated...
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

@MichaelR - Your regexes have some problems and fail to match some valid addresses. First off, the IPv4 regex:
MichaelR wrote:

Code: Select all

// IPv4

'/^(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}$/i'
This regex has one error and a couple possible simplifications:
  • It fails to match valid IP octets having leading zeroes e.g. 192.168.001.020
  • Each set of digits is surrounded by an unnecessary non-capturing group. e.g. '(?:25[0-5])|(?:2[0-4][0-9])' can be written more simply as: '25[0-5]|2[0-4][0-9]'.
  • The 'i' ignorecase modifier is unnecessary as there are no alpha chars.
Concerning the IPv6 regexes. First off lets look at what they should be matching. Here is a list of test IPv6 example addresses having valid syntax:

Code: Select all

ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
   ::ffff:ffff:ffff:ffff:ffff:ffff:ffff
        ::ffff:ffff:ffff:ffff:ffff:ffff
    ffff::ffff:ffff:ffff:ffff:ffff:ffff
             ::ffff:ffff:ffff:ffff:ffff
         ffff::ffff:ffff:ffff:ffff:ffff
    ffff:ffff::ffff:ffff:ffff:ffff:ffff
                  ::ffff:ffff:ffff:ffff
              ffff::ffff:ffff:ffff:ffff
         ffff:ffff::ffff:ffff:ffff:ffff
    ffff:ffff:ffff::ffff:ffff:ffff:ffff
                       ::ffff:ffff:ffff
                   ffff::ffff:ffff:ffff
              ffff:ffff::ffff:ffff:ffff
         ffff:ffff:ffff::ffff:ffff:ffff
    ffff:ffff:ffff:ffff::ffff:ffff:ffff
                            ::ffff:ffff
                        ffff::ffff:ffff
                   ffff:ffff::ffff:ffff
              ffff:ffff:ffff::ffff:ffff
         ffff:ffff:ffff:ffff::ffff:ffff
    ffff:ffff:ffff:ffff:ffff::ffff:ffff
                                 ::ffff
                             ffff::ffff
                        ffff:ffff::ffff
                   ffff:ffff:ffff::ffff
              ffff:ffff:ffff:ffff::ffff
         ffff:ffff:ffff:ffff:ffff::ffff
    ffff:ffff:ffff:ffff:ffff:ffff::ffff
                                     ::
                                 ffff::
                            ffff:ffff::
                       ffff:ffff:ffff::
                  ffff:ffff:ffff:ffff::
             ffff:ffff:ffff:ffff:ffff::
        ffff:ffff:ffff:ffff:ffff:ffff::
   ffff:ffff:ffff:ffff:ffff:ffff:ffff::

  ffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255
     ::ffff:ffff:ffff:ffff:ffff:255.255.255.255
          ::ffff:ffff:ffff:ffff:255.255.255.255
      ffff::ffff:ffff:ffff:ffff:255.255.255.255
               ::ffff:ffff:ffff:255.255.255.255
           ffff::ffff:ffff:ffff:255.255.255.255
      ffff:ffff::ffff:ffff:ffff:255.255.255.255
                    ::ffff:ffff:255.255.255.255
                ffff::ffff:ffff:255.255.255.255
           ffff:ffff::ffff:ffff:255.255.255.255
      ffff:ffff:ffff::ffff:ffff:255.255.255.255
                         ::ffff:255.255.255.255
                     ffff::ffff:255.255.255.255
                ffff:ffff::ffff:255.255.255.255
           ffff:ffff:ffff::ffff:255.255.255.255
      ffff:ffff:ffff:ffff::ffff:255.255.255.255
                              ::255.255.255.255
                          ffff::255.255.255.255
                     ffff:ffff::255.255.255.255
                ffff:ffff:ffff::255.255.255.255
           ffff:ffff:ffff:ffff::255.255.255.255
      ffff:ffff:ffff:ffff:ffff::255.255.255.255
A correct IPv6 regex (such as the one I provided in my previous post) should match every one of these IPv6 example addresses. Note that the first section of addresses are purely colon separated 16-bit numbers, while the second section of addresses consist of an IPv4 (dot separated 8-bit values) syntax for the least significant 32-bits. In other words, the least significant 32-bits of an IPv6 address can be represented as either two 16-bit values (each having up to four hexadecimal digits separated by a colon), or as four 8-bit values (each having up to three decimal digits (less than 256) separated by a dot). An optional double colon acts as a wildcard for one or more zeroed 16-bit values.

Here is your first IPv6 regex which attempts to match the first category:
MichaelR wrote:

Code: Select all

// IPv6

'/^(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9](?::|$)){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?))$/i'
This regex fails to match the following:

Code: Select all

   ::ffff:ffff:ffff:ffff:ffff:ffff:ffff
    ffff::ffff:ffff:ffff:ffff:ffff:ffff
    ffff:ffff::ffff:ffff:ffff:ffff:ffff
    ffff:ffff:ffff::ffff:ffff:ffff:ffff
    ffff:ffff:ffff:ffff::ffff:ffff:ffff
    ffff:ffff:ffff:ffff:ffff::ffff:ffff
    ffff:ffff:ffff:ffff:ffff:ffff::ffff
   ffff:ffff:ffff:ffff:ffff:ffff:ffff::
Here is your second IPv6 regex which attempts to match the second category:
MichaelR wrote:

Code: Select all

// IPv4-mapped IPv6

'/^(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?))(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}$/i'
This regex fails to match the following:

Code: Select all

     ::ffff:ffff:ffff:ffff:ffff:255.255.255.255
      ffff::ffff:ffff:ffff:ffff:255.255.255.255
      ffff:ffff::ffff:ffff:ffff:255.255.255.255
      ffff:ffff:ffff::ffff:ffff:255.255.255.255
      ffff:ffff:ffff:ffff::ffff:255.255.255.255
      ffff:ffff:ffff:ffff:ffff::255.255.255.255
Lastly, here is your final regex which should match all IPv6 addresses:
MichaelR wrote:

Code: Select all

// IPv4, IPv6, or IPv4-mapped IPv6

'/^(?:(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9](?::|$)){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))$/i'
And as expected, this regex fails to match the following:

Code: Select all

   ::ffff:ffff:ffff:ffff:ffff:ffff:ffff
    ffff::ffff:ffff:ffff:ffff:ffff:ffff
    ffff:ffff::ffff:ffff:ffff:ffff:ffff
    ffff:ffff:ffff::ffff:ffff:ffff:ffff
    ffff:ffff:ffff:ffff::ffff:ffff:ffff
    ffff:ffff:ffff:ffff:ffff::ffff:ffff
    ffff:ffff:ffff:ffff:ffff:ffff::ffff
   ffff:ffff:ffff:ffff:ffff:ffff:ffff::
     ::ffff:ffff:ffff:ffff:ffff:255.255.255.255
      ffff::ffff:ffff:ffff:ffff:255.255.255.255
      ffff:ffff::ffff:ffff:ffff:255.255.255.255
      ffff:ffff:ffff::ffff:ffff:255.255.255.255
      ffff:ffff:ffff:ffff::ffff:255.255.255.255
      ffff:ffff:ffff:ffff:ffff::255.255.255.255
I'm not sure where you got these IPv6 regex from but, simply put, they do not work correctly. Sorry to be the messenger here, but you really need to test your regexes before posting them!

For IP matching regexes which actually work (for both IPv4 and IPv6), please refer to my previous posts. Here they are again in compressed format:

Code: Select all

// IPv4
$IPv4address = '/(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/';

// IPv6
$IPv6address = '/(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)/';
If anyone is curious where I came up with this IPv6 regex, I got it directly from the horse's mouth: i.e. from RFC3986 - Uniform Resource Identifier (URI): Generic Syntax. I simply took the ABNF syntax for IPv6address from Appendix A and converted it into regex format. If you look at the commented version in my previous post, you can immediately see the resemblance.

Hope this helps :)
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: Regular Expression to check a valid IP

Post by MichaelR »

All those IPv6 addresses which my regex does not allow are wrong. Which is why they are not allowed. In an IPv6 address there are 8 groups. A double colon represents (at least) two groups. Those addresses failed because there are 9 groups. Which is one group too many.

The i modifier in the IPv4 address was an oversight: I was splitting the entire regex (IPv4 and IPv6 addresses) into separate components and didn't think properly. The unnecessary non-capturing groups was probably because I was having trouble with OR priority; I can't remember. I'll have a look at it without them. As for the leading zeroes, yes, that may be a mistake.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

I stand by my previous post.

According to section 2.2 of: RFC4291 IP Version 6 Addressing Architecture:
RFC4291 wrote:... The use of "::" indicates one or more groups of 16 bits of zeros. ...
The ABNF formula for an IPv6 address in RFC3986 - Uniform Resource Identifier (URI): Generic Syntax also allows a double colon to replace only one set of zeroes:
RFC3986 wrote:

Code: Select all

   IPv6address   =                            6( h16 ":" ) ls32
                 /                       "::" 5( h16 ":" ) ls32
                 / [               h16 ] "::" 4( h16 ":" ) ls32
                 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                 / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                 / [ *4( h16 ":" ) h16 ] "::"              ls32
                 / [ *5( h16 ":" ) h16 ] "::"              h16
                 / [ *6( h16 ":" ) h16 ] "::"

   h16           = 1*4HEXDIG
   ls32          = ( h16 ":" h16 ) / IPv4address
   IPv4address   = dec-octet "." dec-octet "." dec-octet "." dec-octet
So, no, I'm afraid you are mistaken about the double-colon. However, if your regex could be fixed, I would be very interested to see a commented version of it for analysis. It is shorter than the one I derived from RFC3986 and may be faster as well.
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: Regular Expression to check a valid IP

Post by MichaelR »

That's interesting. Because RFC 5321 has this to say:
IPv6-addr = IPv6-full / IPv6-comp / IPv6v4-full / IPv6v4-comp

IPv6-hex = 1*4HEXDIG

IPv6-full = IPv6-hex 7(":" IPv6-hex)

IPv6-comp = [IPv6-hex *5(":" IPv6-hex)] "::"
[IPv6-hex *5(":" IPv6-hex)]
; The "::" represents at least 2 16-bit groups of
; zeros. No more than 6 groups in addition to the
; "::" may be present.

IPv6v4-full = IPv6-hex 5(":" IPv6-hex) ":" IPv4-address-literal

IPv6v4-comp = [IPv6-hex *3(":" IPv6-hex)] "::"
[IPv6-hex *3(":" IPv6-hex) ":"]
IPv4-address-literal
; The "::" represents at least 2 16-bit groups of
; zeros. No more than 4 groups in addition to the
; "::" and IPv4-address-literal may be present.
I'm tempted to side with RFC 5321 simply because it was written 2 years and 8 months after 4291, and so should be more up to date. But if there's some other source to adjudicate, I'd love to know (and update my code if necessary).
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: Regular Expression to check a valid IP

Post by ridgerunner »

That is interesting.

I would tend to go with the RFC that specifically deals with the subject of IPv6 addressing (RFC4291), which is in agreement with the one dealing with URI addressing in general (RFC3986), rather than one which deals with the specific topic of SMTP (RFC5321), regardless of the dates.

However, that said, I would still like to see a commented version of your regex. It may be easy to modify it to handle either case.
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: Regular Expression to check a valid IP

Post by MichaelR »

Code: Select all


  /* A single group of between one and four hexadecimal digits followed by seven groups which start with a colon and end with between one and four hexadecimal digits. */

  $ipv6_full       = '[a-f0-9]{1,4}' . '(?::[a-f0-9]{1,4}){7}';

  /* Firstly, seven or more groups of between one and four hexadecimal digits followed by a colon are not allowed. Then, optionally, a single group of between one and four hexadecimal digits followed by between zero and five groups which start with a colon and end with between one and four hexadecimal digits. Then a double colon. Then, optionally, a single group of between one and four hexadecimal digits followed by between zero and five groups which start with a colon and end with between one and four hexadecimal digits. */

  $ipv6_comp       = '(?!(?:.*[a-f0-9](?::|$)){7,})' . '(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?' . '::' . '(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?';

  /* Either a full IPv6 address or a compressed IPv6 address. */

  $ipv6            = '(?:' . $ipv6_full . ')|(?:' . $ipv6_comp . ')';

  /* A single group of between one and four hexadecimal digits followed by five groups which start with a colon and end with between one and four hexadecimal digits, followed by a colon. */

  $ipv6v4_full     = '[a-f0-9]{1,4}' . '(?::[a-f0-9]{1,4}){5}:';

  /* Firstly, five or more groups of between one and four hexadecimal digits followed by a colon are not allowed. Then, optionally, a single group of between one and four hexadecimal digits followed by between zero and three groups which start with a colon and end with between one and four hexadecimal digits. Then a double colon. Then, optionally, a single group of between one and four hexadecimal digits followed by between zero and three groups which start with a colon and end with between one and four hexadecimal digits. Then a colon. */

  $ipv6v4_comp     = '(?!(?:.*[a-f0-9]:){5,})' . '(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?' . '::' . '(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?';

  /* Either a full IPv4-mapped IPv6 address or a compressed IPv4-mapped IPv6 address. */

  $ipv6v4          = '(?:' . $ipv6v4_full . ')|(?:' . $ipv6v4_comp . ')';

  /* One group of any number between 0 and 255 inclusive followed by three groups which begin with a full stop (period) and end with any number between 0 and 255 inclusive. */

  $ipv4            = '(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])' . '(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}';

  /* Either an IPv6 address or both an optional IPv6-portion of an IPv4-mapped IPv6 address followed by an IPv4 address. */

  $ip_address  = '(?:(?:' . $ipv6 . ')|(?:(?:' . $ipv6v4 . ')?' . $ipv4 . '))';

Might not be too clear; I've never been good at commenting. And I've taken your advice and removed the unnecessary capturing groups.

As you can see, it will be very easy to modify if you are correct in "::" being able to represent a single 16-bit group of zeroes, giving the following which should validate every address you provided above (assuming I haven't made any typos -- most likely missing out an essential grouping), including IPv4 addresses with leading zeros (although some software interprets leading zeros as octal numbers, so perhaps it's best not to allow them?):

Code: Select all

'/^(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9](?::|$)){8,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,6})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,6})?))|(?:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){6,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,4})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,4}:)?))?(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])){3}))$/i'
As an extra note, I have submitted an errata to RFC 5321 suggesting "::" represents at least one group, and another to RFC 4291 suggesting "::" represents at least two groups. Hopefully one will be accepted and one rejected (rather than both rejected or both accepted, which would be very unhelpful).
Last edited by MichaelR on Tue Aug 17, 2010 4:45 pm, edited 3 times in total.
devarishi
Forum Contributor
Posts: 101
Joined: Fri Feb 05, 2010 7:15 pm

Re: Regular Expression to check a valid IP

Post by devarishi »

Hi All,


It is fantastic to see how you all try to provide help.

Thanks to all of you!

Dev.
Post Reply