Page 1 of 1
reg expressions
Posted: Thu Sep 25, 2003 5:16 pm
by Derfel Cadarn
Hello!
I have to admit I'm a preg-match newbie and I seem to just not understand it.... I want to identify IP-numbers from a text, which is returned from a whois-request. I want to identify the IP-numbers in that text and present them as a hyperlink.
I know any IP-number is built up like:( (1 -3 numbers)point(1-3 numbers)point(1-3 numbers)point(1-3 numbers)) and I want to check every word returned from a Whois-request: if it is an IP-number, I want to present it as a hyperlink.
Can anyone tell me what I should use so I can identify it in PHP?
I've tried
Code: Select all
preg_match("/ї0-9]{1,3}\.ї0-9]{1,3}\.\.ї0-9]{1,3}\.\.ї0-9]{1,3}/")
but that doesn't seem to work....must have made a mistake...
EDIT: sorry, in that code were some typo's. I meant to say I've tried:
Code: Select all
preg_match("/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/")[\code]
Could any1 help me out there? PLS?
Ad
Posted: Fri Sep 26, 2003 6:56 am
by Derfel Cadarn
Hi,
Just to make it a bit clearer (I hope) some additional info.
I just edited the code, it looks completely logical, but it doesn't work in my script! I just don't understand it.
I use a function to present the whois-output word-by-word:
- it splits the whois-text at the spaces and trims the spaces from the words.
- then it checks each word wether it is an IP-number, an e-mail or an internetadres: these are all presented as a link;
- the other words are 'normal' words and are presented as such.
The function works for all but the IP's!:evil:
The code for the function is as follows:
Code: Select all
<?php
function present($rawoutput,$thispage) {
// This function checks wether a word is an IP-number or not
// if so, it's presented as link to the whois-form
$rawwords = preg_split("/(?=\s)/",$rawoutput);
$wrong_ip = false;
foreach ($rawwords as $word) {
$word = trim($word);
echo "<br>$wrong_ip [".$word."]";
if(!eregi("^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$", $word)) {
$wrong_ip = true;
}
if (!$wrong_ip) {
echo "<a class='text' href='$thispage?domain=$word'>".$word."</a> ";
} elseif (preg_match("/@/",$word)) {
echo "<a class='text' href='mailto:$word'>".$word."</a> ";
} elseif (preg_match("/^http:/",$word)) {
echo "<a class='text' target='new' href='$word'>".$word."</a> ";
} else {
echo $word." ";
}
}
}
?>
I've added the text
echo "<br>$wrong_ip [".$word."]"; just for debugging.
The result I get is:
Code: Select all
ї%]%
1 їїwhois.apnic.net]їwhois.apnic.net
1 їnode-1]]node-1]
1 ї]
ї%]%
1 їWhois]Whois
1 їdata]data
1 їcopyright]copyright
1 їterms]terms
1 ї]
1 ї]
1 ї]
1 їhttp://www.apnic.net/db/dbcopyright.html]http://www.apnic.net/db/dbcopyright.html
1 ї]
ї]
1 ї]
їinetnum:]inetnum:
1 ї]
1 ї]
1 ї]
1 ї]
1 ї]
1 ї211.21.0.0]211.21.0.0
1 ї-]-
1 ї211.21.255.255]211.21.255.255
1 ї]
etc.
It seems to me that the IP is of the correct format, but it still doesn't get detected!!
Can anybody find what I do wrong?? All ideas are welcome!!
Thanx
Ad
Posted: Fri Sep 26, 2003 7:06 am
by twigletmac
I ran your regex in the following code (preg_match() instead of eregi() because it's a better, faster function):
Code: Select all
<?php
$ip = $_SERVER['REMOTE_ADDR'];
if(!preg_match('/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/', $ip)) {
echo 'no';
} else {
echo 'yes';
}
?>
and got the expected results - i.e. it validated my IP address (and the two in your second post) so it's not the regex which is going screwy. Perhaps there are spaces or other characters causing a problem? I know you're trimming the data but could something be slipping throught the net?
Mac
Posted: Fri Sep 26, 2003 8:20 am
by Derfel Cadarn
Hi Twigletmac,
Thanx for testing! I tried your script and it said "no" when I test it off-line and "yes" when I test it online...I guess my Apache-configuration isn't OK.
But when I include your regex into my function it still doesn't work (neither offline nor online). So there
must be something slipping through, yes. Stupid thing is that that alien doesn't show up in those debugging-echo's (between the []-brackets, like I showed in my second posting).
Is it
possible it has to do with my keyboard: on the place, where on an US-keyboard the decimal point is located, mine gives has comma and when I type it it gives some ASCII-code I cannot show here (looks like a | ).
Therefore I've use the 'point'-symbol. But I think that has nothing to do with it at all. That would be nonsense, imho.
I'll do some more experimenting..
Please do not stop testing!!
Ad
Posted: Fri Sep 26, 2003 9:04 am
by twigletmac
Could you post a copy of the text from the whois request that you are parsing so I can see what it does locally.
Out of interest, which version of PHP do you have locally and externally?
Mac
Posted: Fri Sep 26, 2003 9:44 am
by Derfel Cadarn
Hi,
Offline I use php-4.3.1 and online php-4.2.2
I tested on the IP-number 211.21.123.15, because it has a lot of phone-numbers, that is where I discoverd my script was buggy. The Whois-info returned was as follows (or did you want it as e-mail? If so ,I apologise for posting it here!)
Code: Select all
IP-number: (i.e. 65.125.25.9)
. . .
The results we received from whois.apnic.net are:
% їwhois.apnic.net node-1]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html
inetnum: 211.21.0.0 - 211.21.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW
admin-c: HN27-AP
tech-c: HN28-AP
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000707
status: ALLOCATED PORTABLE
source: APNIC
person: HINET Network-Adm
address: CHTD, Chunghwa Telecom Co., Ltd.
address: Data-Bldg. 6F, No. 21, Sec. 21, Hsin-Yi Rd.,
address: Taipei Taiwan 100
country: TW
phone: +886 2 2322 3495
phone: +886 2 2322 3442
phone: +886 2 2344 3007
fax-no: +886 2 2344 2513
fax-no: +886 2 2395 5671
e-mail: network-adm@hinet.net
nic-hdl: HN27-AP
remarks: same as TWNIC nic-handle HN184-TW
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000721
source: APNIC
person: HINET Network-Center
address: CHTD, Chunghwa Telecom Co., Ltd.
address: Data-Bldg. 6F, No. 21, Sec. 21, Hsin-Yi Rd.,
address: Taipei Taiwan 100
country: TW
phone: +886 2 2322 3495
phone: +886 2 2322 3442
phone: +886 2 2344 3007
fax-no: +886 2 2344 2513
fax-no: +886 2 2395 5671
e-mail: network-center@hinet.net
nic-hdl: HN28-AP
remarks: same as TWNIC nic-handle HN185-TW
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000721
source: APNIC
inetnum: 211.21.123.0 - 211.21.123.63
netname: SHI-DAI-TSAI-JIN-TP-TW
descr: Shi Dai Tsai Jin Informational Ltd.
descr: 12F, No. 29, Sec. 3, Luo Sh Fu Rd.
descr: Taipei Taiwan
country: TW
admin-c: JT64-TW
tech-c: JT64-TW
mnt-by: MAINT-TW-TWNIC
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
changed: network-adm@hinet.net 20030618
status: ASSIGNED NON-PORTABLE
source: TWNIC
person: Chiao Chi Deng
address: Shi Dai Tsai Jin Informational Ltd.
address: 12F, No. 29, Sec. 3, Luo Sh Fu Rd.
address: Taipei Taiwan
country: TW
phone: +886-2-2363-0568
fax-no: +886-2-2792-2190
e-mail: jessie@moderntimes.com.tw
nic-hdl: JT64-TW
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
changed: hostmaster@twnic.net 20030618
source: TWNIC
I hope you find something, I'm short of giving up on it...sob sob,
Ad
Posted: Fri Sep 26, 2003 12:14 pm
by Derfel Cadarn
I've got it!! I feel kinda dumb, though!!
With all your help I got convinced the regex HAD to be OK and the bug had to be in the function. And it was. I really feel stupid to have to admit it, but I had forgotten something quite important:
Code: Select all
function present($rawoutput,$thispage) {
// This function checks wether a word is an IP-number or not
// if so, it's presented as link to the whois-form
$rawwords = preg_split("/(?=\s)/",$rawoutput);
$wrong_ip = false;
foreach ($rawwords as $word) {
$word = trim($word);
//echo "<br>ї$word=".gettype($word)."]";
echo "<br>$wrong_ip ї".$word."]";
if(!preg_match('/^ї0-9]{1,3}\.ї0-9]{1,3}\.ї0-9]{1,3}\.ї0-9]{1,3}/', $word)) { //"^ї0-9]{1,3}\.ї0-9]{1,3}\.ї0-9]{1,3}\.ї0-9]{1,3}$"
$wrong_ip = true;
}
if (!$wrong_ip) {
echo "<a class='text' href='$thispage?domain=$word'>".$word."</a> ";
} elseif (preg_match("/@/",$word)) {
echo "<a class='text' href='mailto:$word'>".$word."</a> ";
} elseif (preg_match("/^http:/",$word)) {
echo "<a class='text' target='new' href='$word'>".$word."</a> ";
} else {
echo $word." ";
}
$wrong_ip = false;
}
}
All I had overseen is to RESET the $wrong_ip to "false" after each loop (last command-line). With that it works like a fiddle!!
YIHAA, gotcha, you silly BUG!!
Thanx, Twigletmac, for all your help!
Ad
Posted: Fri Sep 26, 2003 12:21 pm
by m3rajk
btw: it'll make the writng of regexp faster... well when using preg.
[0-9] is actually \d and ^[0-9] is \D (^=not)
Posted: Fri Sep 26, 2003 12:27 pm
by Derfel Cadarn
Thanx, m3rajk!
I knew that [0-9] is the same as \d, but I thought I'd learn a bit and try to get it to work before I start optimizing...
I've shortened it to:
Code: Select all
if(!preg_match('/^\d+\.\d+\.\d+\.\d+$/', $word)) {
I didn't know that ^ meant 'not' though! My PHP_Bible tells me it means 'start of the text'. That could (and would) have caused me some hours searching !!!
Ad
Posted: Fri Sep 26, 2003 12:39 pm
by m3rajk
^ means the begingin of a line OR not. its meaning is determined by how you have it in the line. and if you use preg, forget php bibles. best regexp prep i've found is out of a perl book that willl teach you perl reg exp shorts:learning perl by randal l schwartz and tom pheonix printed by oreilly press. the edition i have is isbn: 0596001320
Posted: Fri Sep 26, 2003 12:57 pm
by Derfel Cadarn
I think you're quite right about that: my PHP-bible says nothing about that ^ ! But I actually had hoped I wouldn't need these regex's too often.... I'll have a look in a bookshop for that book, I'm quite happy with most of the OReilly's. It might be difficult to get it in english here (Berlin), but those translations are awful and unreadable..
Posted: Sat Sep 27, 2003 4:26 pm
by m3rajk
Derfel Cadarn wrote:I think you're quite right about that: my PHP-bible says nothing about that ^ ! But I actually had hoped I wouldn't need these regex's too often.... I'll have a look in a bookshop for that book, I'm quite happy with most of the OReilly's. It might be difficult to get it in english here (Berlin), but those translations are awful and unreadable..
http://www.bookpool.com
i've found them to be the cheapest for me most of the time.. and i live in ma, so i have to include sales tax... and if you search by isbn, i belive it will actually change for a diff language, so that isbn is that edition in english