i'm doing some automatic relinking of image tags in HTML pages in order to make the src addresses to these images absolute. in order to cover the various possibilities (absolute URI with server name, absolute path on server, relative path, and last but not least unquoted src), i'm using an array of source patterns, and an array of replacement strings. here's what the array looks like:
Code: Select all
$body = preg_replace(
array( '!<img\s*([^>]*?)\s+src=(["\'])(/[^"\']+)\2!i', '!<img\s*([^>]*?)\s+src=(["\'])(http://[^"\']+)\2!i', '!<img\s*([^>]*?)src=(["\'])([^"\']+)\2!i', '!<img\s*([^>]*?)src=([^\'" ]+)!i' ),
array( "<!-- 1 --><img $1 src=\"$protocol://$server$3\"", "<!-- 2 --><img $1 src=\"$3\"", "<!-- 3 --><img $1 src=\"$baseuri$3\"", "<!-- 4 --><img $1 src='$baseuri$2'" ),
$body2 );so now i get results like the following:
Code: Select all
<!-- 1 -->
<!-- 2 -->
<!-- 3 -->
<img src="http://www.google.com/http://www.google.com/images/logo_sm.gif" width=150 height=55 alt=Google border=0 vspace=12>
so the first question is of course: is it really the way it's meant to be?? and my second question is, is there an easy way to prevent this? of course, i can simply write a loop that goes through the HTML code and applies only 1 replacement for each image, but i was just wondering how more experienced programmers would do...
thx in advance for your help!
pagod
PS: in case that might be relevant, i'm running PHP 5.2.6 (Zend Engine v2.2.0) on MacOS Leopard 10.5.6 (Intel) with Apache 2.2.9 installed.