Page 1 of 1

Anyone have a URL String Validation Routine ?

Posted: Sun Aug 18, 2002 6:55 pm
by HUWUWA
Hi guys, this question is similar to my other one about validating Email addresses.

To tell you the truth, I can only half-understand the:

$pattern = "/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/";

in the preg_match() routine.

I'm trying to decipher it but it is so cryptic.

How about a routine that can see if a string starts with 'http://', and has at least one '.' in it (and not at the very end or right after the 'http://') ?

Then I think I can figure it all out.

Thanks.

Re: Anyone have a URL String Validation Routine ?

Posted: Sun Aug 18, 2002 8:30 pm
by chiefmonkey
HUWUWA wrote:Hi guys, this question is similar to my other one about validating Email addresses.

To tell you the truth, I can only half-understand the:

$pattern = "/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/";

in the preg_match() routine.

I'm trying to decipher it but it is so cryptic.

How about a routine that can see if a string starts with 'http://', and has at least one '.' in it (and not at the very end or right after the 'http://') ?

Then I think I can figure it all out.

Thanks.
Something like
<?php
$url="http://www.evilwalrus.com";
if(!eregi("^http://[A-Za-z0-9\%\?\_\:\~\/\.-]+[.]([0-9,a-z,A-Z]){2}([0-9,a-z,A-Z])?$",$url)) {
print("This is not a valid URL");
} else {
print("URL Valid");
}
?>

Posted: Sun Aug 18, 2002 8:59 pm
by phpPete
What this all means?

"/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/";

Reg-expr | Description
-----------+---------------------------------------------------
. | Matches any character except newline
[a-z0-9] | Matches any single character of the set
[^a-z0-9] | Matches any single character not in set
\d | Matches a digit, i.e., [0-9]
\w | Matches a alpha-numeric character, i.e., [a-zA-Z0-9_]
\W | Matches a non-word character, i.e., [^a-zA-Z0-9_]
\metachar | Matches the character itself, i.e., \|, \*, \+
x? | Matches 0 or 1 x's, where x is any of the above
x* | Matches 0 or more x's
x+ | Matches 1 or more x's
x{m,n} | Matches at least m x's but no more than n
foo|bar | Matches one of foo or bar
(x) | Brackets a regular expression (this is a bit of a lie :-)
\b | Matches a word boundary


relevant link: http://www.dcs.qmul.ac.uk/publications/ ... egexp.html

Posted: Mon Aug 19, 2002 11:37 am
by HUWUWA
Thanks phpPete but it still looks like Chinese to me, there are forward slashes, back slashes, caret signs, + signs, * signs, man it's hard to figure out. And I know C(++) so I'm no idiot.

I like using the preg_match() function so I can't use the other code above.

In my form textbox it defaults to 'http://' but some guys paste a link and it becomes 'http://http://...' and also some guys just put in, say, 'mysite' so it becomes 'http://mysite'.

Basically I need a way to check for the 'http://' at the beginning, make sure it isn't followed with another 'http://' and make sure there is at least one '.' in it.

I think if I can do that I can figure out how to test for a 'www' without the 'http://' using a conditional so that the address can still be valid. I can also test for the double 'http://' and remove one if need be.

Will you help me out ? Thanks.

Posted: Mon Aug 19, 2002 12:29 pm
by llimllib
I suggest you read this to begin understanding regular expressions.

Posted: Mon Aug 19, 2002 1:03 pm
by phpPete

Code: Select all

if (!preg_match('/^(http|https|ftp):\/\/((&#1111;A-Z0-9]&#1111;A-Z0-9_-]*)(\.&#1111;A-Z0-9]&#1111;A-Z0-9_-]*)+)/i', $url )) 
    &#123; 
        echo "BAD URL";
    &#125; 
else 
    &#123; 
       echo "GOOD URL";
    &#125;
found this at ZEND

Posted: Mon Aug 19, 2002 1:13 pm
by HUWUWA
Thanks, I think that is exactly what I needed. I'm going to study it now.