Page 1 of 2

regular expression

Posted: Thu Feb 06, 2003 2:24 pm
by kendall
hey ppl,

'^[a-z0-9\-]+\.(com)|(net)|(org)|(biz)|(info)$'

the above is an expression that validates the format of a domain name thus 'anydomain.com' or biz etc is correct

now the thing is while i checked http://www.apex-solu.com, kendall.co.tt and it debugged ok http://www.kendall.biz did not

the theory behind the expression is

^[a-z0-9\-]+ \\ at the begin of the domain name from a-z0-9 and '-' repeated many times
\. \\ then a dot
(com)|(net)|(org)|(biz)|(info)$ \\ any 1 of the ext at the end

is my theory correct?

if not

can u advise me accordingly?

Kendall

ok

Posted: Thu Feb 06, 2003 3:33 pm
by AVATAr
that's ok! :wink:

Posted: Thu Feb 06, 2003 4:33 pm
by lazy_yogi
even these don't work
http://www.apex-solu.com, kendall.co.tt
I have no idea how they worked for you

what your reg exp does is check for any of these chars : [a-z0-9\-]
and then a dot and then any of these : (com)|(net)|(org)|(biz)|(info)
and then the end of line

this would be ok for
somedomain.com
but not for
http://www.somedomain.com
and not for
http://www.forums.somedomain.com
cuz they have extra full stops


you'd need to remove the begin of line charactore cuz subdomains have any number of full stops .. eg :
http://www.forums.domainname.co.uk
and you'll also need to remove the com/net/org/biz/info
since it could come from any one of the hundred plus countries
eg australia : http://www.stuff.com.au
uk : http://www.stuff.co.uk

this would work .. but is very loose in checking ... pointless i think
so it really doesn't check effectively anyway

if (preg_match('/[a-z0-9\-_]+\.(com)|(net)|(org)|(biz)|(info)$/', $dom))
print "yes";
else print "no";

theory

Posted: Thu Feb 06, 2003 6:04 pm
by AVATAr
jaja. i was answearing your question:
the theory behind the expression is

^[a-z0-9\-]+ \\ at the begin of the domain name from a-z0-9 and '-' repeated many times
\. \\ then a dot
(com)|(net)|(org)|(biz)|(info)$ \\ any 1 of the ext at the end

is my theory correct?
think how to add www. to the end... the ^ represent the "it start with" ...
:idea:

Posted: Thu Feb 06, 2003 10:05 pm
by Stoker
I think you want something like
^[a-z0-9][a-z0-9\-]+\.(com|net|org|biz|info)$
instead perhaps? and use strtolower on your string to compare (that is more efficient than asking the perl regex engine to do case insensitive search).

well

Posted: Fri Feb 07, 2003 6:34 am
by AVATAr
here is the solution to de http://www.something.com

'^www\.[a-z0-9]+\.(com|net|org|biz|info)$'

lets explain this a bit

^www\. -> the string start with www. (you have to scape de "."), then
[a-z0-9]+ -> any alfabetical character or number, 1 or more times, then
(com|net|org|biz|info)$ -> com or net or org or biz or info at the end.

:P

regular expression

Posted: Fri Feb 07, 2003 7:12 am
by kendall
Uh,

u guys

i think there's a mis conception here

firstly i only want to search for the ext tlds that i listed
secondly i dont want them to put the www. infront
thus
http://www.apex-solutions.com suppose to be wrong
but apex-solutions.com would be rite

i think i'll try puting ^[^www.] in front which means (correct me if im wrong h ere) not starting with www.

ok?

Kendall

yep

Posted: Fri Feb 07, 2003 7:38 am
by AVATAr
you're right... use de ^[^www.]

but be aware of the use of the "." cause maybe you have tu escape it with "\."

good luck

Posted: Fri Feb 07, 2003 7:56 am
by Stoker
as I posted earlier,
^[a-z0-9][a-z0-9\-]+\.(com|net|org|biz|info)$
should work just fine, that will do this:

-start of string
-1 character a-z 0-9
-1 or more character a-z 0-9 or dash
-literal dot
-con or net or org or biz or info
-end of string


and [ ] is a character class, and ^ inside it is not so by using [^www.] you are telling it to not accept a string that starts with w, nor w nor w nor literal dot...
Your problem to begin with was wrong use of parenthesis (a|b|c), I added on the functionality of that it must be at least 2 characters long before the dot, and the first letter may not be a dash..

yep

Posted: Fri Feb 07, 2003 8:04 am
by AVATAr
you're right if you want to use www you have tu use [w]{3}

ups

Reguar Expressions

Posted: Fri Feb 07, 2003 8:56 am
by kendall
Ah,

Guys

well i got this to work

^[^(www)\.][a-z0-9\-]+\.(com|net|org|biz|info)$

in not accpeting www

but now it not even accepting wwf.com lol :lol:

even [w]{3} didnt work if i wanted a strict www on it wat to use

Kendall


P.S. i think yours wrong there as that doesnt relate to what im trying to do :wink:

Posted: Fri Feb 07, 2003 9:02 am
by AVATAr
use Stoker solution!

ereg('^[a-z0-9][a-z0-9\-]+\.(com|net|org|biz|info)$','web.com' );

Posted: Fri Feb 07, 2003 9:07 am
by Stoker
did you try the one that I posted twice now?

as I tried to say [] is what is called a character class, so what you made there does not make much sense

[^(www)\.]
NOT ( nor w nor w nor w nor ) nor dot
and btw inside character classes, the only two that need escapes are ] and -

edit/add: Hadn't seen avatars post before I posted, just want to add on that you should never ever use ereg unless there are a very specifical reason for it, use preg instead, a lot more efficient.

Regular Expression

Posted: Fri Feb 07, 2003 9:18 am
by kendall
OHHHH :oops:

forgive me stoker i was blind to what you were really trying to say as i thought you were trying to give me the expression to include the www. but then i have even blinded my self

as ^[a-z0-9\-]+\.(com|net|org|biz|info)$' works well as

^[a-z0-9][a-z0-9\-]+\.(com|net|org|biz|info)$'

which is what i originall had

Stoker really do appologise :lol:

quote

Posted: Fri Feb 07, 2003 9:31 am
by AVATAr
^[a-z0-9\-]+\.(com|net|org|biz|info)$'

will recognize "-web.com", with stoker solution you the first character it will be a letter or a number...