reg expressions

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

reg expressions

Post by Derfel Cadarn »

Hello!

I have to admit I'm a preg-match newbie and I seem to just not understand it.... I want to identify IP-numbers from a text, which is returned from a whois-request. I want to identify the IP-numbers in that text and present them as a hyperlink.

I know any IP-number is built up like:( (1 -3 numbers)point(1-3 numbers)point(1-3 numbers)point(1-3 numbers)) and I want to check every word returned from a Whois-request: if it is an IP-number, I want to present it as a hyperlink.

Can anyone tell me what I should use so I can identify it in PHP?
I've tried

Code: Select all

preg_match("/ї0-9]{1,3}\.ї0-9]{1,3}\.\.ї0-9]{1,3}\.\.ї0-9]{1,3}/")
but that doesn't seem to work....must have made a mistake...

EDIT: sorry, in that code were some typo's. I meant to say I've tried:

Code: Select all

preg_match("/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/")[\code]

Could any1 help me out there? PLS?

Ad
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

Hi,

Just to make it a bit clearer (I hope) some additional info.

I just edited the code, it looks completely logical, but it doesn't work in my script! I just don't understand it.
I use a function to present the whois-output word-by-word:
- it splits the whois-text at the spaces and trims the spaces from the words.
- then it checks each word wether it is an IP-number, an e-mail or an internetadres: these are all presented as a link;
- the other words are 'normal' words and are presented as such.

The function works for all but the IP's!:evil:

The code for the function is as follows:

Code: Select all

<?php
function present($rawoutput,$thispage) {
	// This function checks wether a word is an IP-number or not
	// if so, it's presented as link to the whois-form
	$rawwords = preg_split("/(?=\s)/",$rawoutput);
	$wrong_ip = false;

	foreach ($rawwords as $word) {
		$word = trim($word);

		echo "<br>$wrong_ip [".$word."]";

		if(!eregi("^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$", $word)) {
			$wrong_ip = true;
		}

		if (!$wrong_ip) {
			echo "<a class='text' href='$thispage?domain=$word'>".$word."</a> ";
		} elseif (preg_match("/@/",$word)) {
			echo "<a class='text' href='mailto:$word'>".$word."</a> ";
		} elseif (preg_match("/^http:/",$word)) {
			echo "<a class='text' target='new' href='$word'>".$word."</a> ";
		} else {
			echo $word." ";
		} 
	}
}
?>
I've added the text echo "<br>$wrong_ip [".$word."]"; just for debugging.

The result I get is:

Code: Select all

&#1111;%]%
1 &#1111;&#1111;whois.apnic.net]&#1111;whois.apnic.net
1 &#1111;node-1]]node-1]
1 &#1111;]

&#1111;%]%
1 &#1111;Whois]Whois
1 &#1111;data]data
1 &#1111;copyright]copyright
1 &#1111;terms]terms
1 &#1111;]
1 &#1111;]
1 &#1111;]
1 &#1111;http://www.apnic.net/db/dbcopyright.html]http://www.apnic.net/db/dbcopyright.html
1 &#1111;]

&#1111;]
1 &#1111;]

&#1111;inetnum:]inetnum:
1 &#1111;]
1 &#1111;]
1 &#1111;]
1 &#1111;]
1 &#1111;]
1 &#1111;211.21.0.0]211.21.0.0
1 &#1111;-]-
1 &#1111;211.21.255.255]211.21.255.255
1 &#1111;]
etc.
It seems to me that the IP is of the correct format, but it still doesn't get detected!!

Can anybody find what I do wrong?? All ideas are welcome!!
Thanx

Ad
User avatar
twigletmac
Her Royal Site Adminness
Posts: 5371
Joined: Tue Apr 23, 2002 2:21 am
Location: Essex, UK

Post by twigletmac »

I ran your regex in the following code (preg_match() instead of eregi() because it's a better, faster function):

Code: Select all

<?php

$ip = $_SERVER['REMOTE_ADDR'];

if(!preg_match('/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/', $ip)) {
	echo 'no';
} else {
	echo 'yes';
}

?>
and got the expected results - i.e. it validated my IP address (and the two in your second post) so it's not the regex which is going screwy. Perhaps there are spaces or other characters causing a problem? I know you're trimming the data but could something be slipping throught the net?

Mac
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

Hi Twigletmac,

Thanx for testing! I tried your script and it said "no" when I test it off-line and "yes" when I test it online...I guess my Apache-configuration isn't OK.

But when I include your regex into my function it still doesn't work (neither offline nor online). So there must be something slipping through, yes. Stupid thing is that that alien doesn't show up in those debugging-echo's (between the []-brackets, like I showed in my second posting).

Is it possible it has to do with my keyboard: on the place, where on an US-keyboard the decimal point is located, mine gives has comma and when I type it it gives some ASCII-code I cannot show here (looks like a | ). 8O
Therefore I've use the 'point'-symbol. But I think that has nothing to do with it at all. That would be nonsense, imho.

I'll do some more experimenting..
Please do not stop testing!! :wink:

Ad
User avatar
twigletmac
Her Royal Site Adminness
Posts: 5371
Joined: Tue Apr 23, 2002 2:21 am
Location: Essex, UK

Post by twigletmac »

Could you post a copy of the text from the whois request that you are parsing so I can see what it does locally.

Out of interest, which version of PHP do you have locally and externally?

Mac
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

Hi,

Offline I use php-4.3.1 and online php-4.2.2

I tested on the IP-number 211.21.123.15, because it has a lot of phone-numbers, that is where I discoverd my script was buggy. The Whois-info returned was as follows (or did you want it as e-mail? If so ,I apologise for posting it here!)

Code: Select all

IP-number: (i.e. 65.125.25.9)

. . .
The results we received from whois.apnic.net are:

% &#1111;whois.apnic.net node-1]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html

inetnum: 211.21.0.0 - 211.21.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW
admin-c: HN27-AP
tech-c: HN28-AP
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000707
status: ALLOCATED PORTABLE
source: APNIC

person: HINET Network-Adm
address: CHTD, Chunghwa Telecom Co., Ltd.
address: Data-Bldg. 6F, No. 21, Sec. 21, Hsin-Yi Rd.,
address: Taipei Taiwan 100
country: TW
phone: +886 2 2322 3495
phone: +886 2 2322 3442
phone: +886 2 2344 3007
fax-no: +886 2 2344 2513
fax-no: +886 2 2395 5671
e-mail: network-adm@hinet.net
nic-hdl: HN27-AP
remarks: same as TWNIC nic-handle HN184-TW
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000721
source: APNIC

person: HINET Network-Center
address: CHTD, Chunghwa Telecom Co., Ltd.
address: Data-Bldg. 6F, No. 21, Sec. 21, Hsin-Yi Rd.,
address: Taipei Taiwan 100
country: TW
phone: +886 2 2322 3495
phone: +886 2 2322 3442
phone: +886 2 2344 3007
fax-no: +886 2 2344 2513
fax-no: +886 2 2395 5671
e-mail: network-center@hinet.net
nic-hdl: HN28-AP
remarks: same as TWNIC nic-handle HN185-TW
mnt-by: MAINT-TW-TWNIC
changed: hostmaster@twnic.net 20000721
source: APNIC

inetnum: 211.21.123.0 - 211.21.123.63
netname: SHI-DAI-TSAI-JIN-TP-TW
descr: Shi Dai Tsai Jin Informational Ltd.
descr: 12F, No. 29, Sec. 3, Luo Sh Fu Rd.
descr: Taipei Taiwan
country: TW
admin-c: JT64-TW
tech-c: JT64-TW
mnt-by: MAINT-TW-TWNIC
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
changed: network-adm@hinet.net 20030618
status: ASSIGNED NON-PORTABLE
source: TWNIC

person: Chiao Chi Deng
address: Shi Dai Tsai Jin Informational Ltd.
address: 12F, No. 29, Sec. 3, Luo Sh Fu Rd.
address: Taipei Taiwan
country: TW
phone: +886-2-2363-0568
fax-no: +886-2-2792-2190
e-mail: jessie@moderntimes.com.tw
nic-hdl: JT64-TW
remarks: This information has been partially mirrored by APNIC from
remarks: TWNIC. To obtain more specific information, please use the
remarks: TWNIC whois server at whois.twnic.net.
changed: hostmaster@twnic.net 20030618
source: TWNIC
I hope you find something, I'm short of giving up on it...sob sob,

Ad
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

I've got it!! I feel kinda dumb, though!! :oops:

With all your help I got convinced the regex HAD to be OK and the bug had to be in the function. And it was. I really feel stupid to have to admit it, but I had forgotten something quite important:

Code: Select all

function present($rawoutput,$thispage) &#123;
    // This function checks wether a word is an IP-number or not
    // if so, it's presented as link to the whois-form
    $rawwords = preg_split("/(?=\s)/",$rawoutput);
    $wrong_ip = false;

    foreach ($rawwords as $word) &#123;
        $word = trim($word);
//echo "<br>&#1111;$word=".gettype($word)."]";
        echo "<br>$wrong_ip &#1111;".$word."]";

        if(!preg_match('/^&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;/', $word)) &#123;  //"^&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;\.&#1111;0-9]&#123;1,3&#125;$"
            $wrong_ip = true;
        &#125;

        if (!$wrong_ip) &#123;
            echo "<a class='text' href='$thispage?domain=$word'>".$word."</a> ";
        &#125; elseif (preg_match("/@/",$word)) &#123;
            echo "<a class='text' href='mailto:$word'>".$word."</a> ";
        &#125; elseif (preg_match("/^http:/",$word)) &#123;
            echo "<a class='text' target='new' href='$word'>".$word."</a> ";
        &#125; else &#123;
            echo $word." ";
        &#125;
        $wrong_ip = false;
    &#125;
&#125;
All I had overseen is to RESET the $wrong_ip to "false" after each loop (last command-line). With that it works like a fiddle!!
YIHAA, gotcha, you silly BUG!!

:D

Thanx, Twigletmac, for all your help!

Ad
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

btw: it'll make the writng of regexp faster... well when using preg.
[0-9] is actually \d and ^[0-9] is \D (^=not)
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

Thanx, m3rajk!
I knew that [0-9] is the same as \d, but I thought I'd learn a bit and try to get it to work before I start optimizing...

I've shortened it to:

Code: Select all

if(!preg_match('/^\d+\.\d+\.\d+\.\d+$/', $word)) &#123;
I didn't know that ^ meant 'not' though! My PHP_Bible tells me it means 'start of the text'. That could (and would) have caused me some hours searching !!! :)


Ad
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

^ means the begingin of a line OR not. its meaning is determined by how you have it in the line. and if you use preg, forget php bibles. best regexp prep i've found is out of a perl book that willl teach you perl reg exp shorts:learning perl by randal l schwartz and tom pheonix printed by oreilly press. the edition i have is isbn: 0596001320
User avatar
Derfel Cadarn
Forum Contributor
Posts: 193
Joined: Thu Jul 17, 2003 12:02 pm
Location: Berlin, Germany

Post by Derfel Cadarn »

I think you're quite right about that: my PHP-bible says nothing about that ^ ! But I actually had hoped I wouldn't need these regex's too often.... I'll have a look in a bookshop for that book, I'm quite happy with most of the OReilly's. It might be difficult to get it in english here (Berlin), but those translations are awful and unreadable..
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

Derfel Cadarn wrote:I think you're quite right about that: my PHP-bible says nothing about that ^ ! But I actually had hoped I wouldn't need these regex's too often.... I'll have a look in a bookshop for that book, I'm quite happy with most of the OReilly's. It might be difficult to get it in english here (Berlin), but those translations are awful and unreadable..
http://www.bookpool.com

i've found them to be the cheapest for me most of the time.. and i live in ma, so i have to include sales tax... and if you search by isbn, i belive it will actually change for a diff language, so that isbn is that edition in english
Post Reply