preg_replace with pattern array
Posted: Fri Mar 20, 2009 6:02 am
Hi,
i'm doing some automatic relinking of image tags in HTML pages in order to make the src addresses to these images absolute. in order to cover the various possibilities (absolute URI with server name, absolute path on server, relative path, and last but not least unquoted src), i'm using an array of source patterns, and an array of replacement strings. here's what the array looks like:
the problem i have is that the results of the replacements also match the patterns in the array, since it's only relinking. and apparently, PHP tries to iteratively match all elements in the array starting at the same position in the source string, and regardless of whether one pattern already matched! i must say i wasn't expecting that, i thought it would stop replace at a given position as soon as any of the source patterns would match, and then go on after the matched part!
so now i get results like the following:
as the numbered comments show, the various patterns in the array have actually all (but one) been matched in turn at the same, already processed position in the source string.
so the first question is of course: is it really the way it's meant to be?? and my second question is, is there an easy way to prevent this? of course, i can simply write a loop that goes through the HTML code and applies only 1 replacement for each image, but i was just wondering how more experienced programmers would do...
thx in advance for your help!
pagod
PS: in case that might be relevant, i'm running PHP 5.2.6 (Zend Engine v2.2.0) on MacOS Leopard 10.5.6 (Intel) with Apache 2.2.9 installed.
i'm doing some automatic relinking of image tags in HTML pages in order to make the src addresses to these images absolute. in order to cover the various possibilities (absolute URI with server name, absolute path on server, relative path, and last but not least unquoted src), i'm using an array of source patterns, and an array of replacement strings. here's what the array looks like:
Code: Select all
$body = preg_replace(
array( '!<img\s*([^>]*?)\s+src=(["\'])(/[^"\']+)\2!i', '!<img\s*([^>]*?)\s+src=(["\'])(http://[^"\']+)\2!i', '!<img\s*([^>]*?)src=(["\'])([^"\']+)\2!i', '!<img\s*([^>]*?)src=([^\'" ]+)!i' ),
array( "<!-- 1 --><img $1 src=\"$protocol://$server$3\"", "<!-- 2 --><img $1 src=\"$3\"", "<!-- 3 --><img $1 src=\"$baseuri$3\"", "<!-- 4 --><img $1 src='$baseuri$2'" ),
$body2 );so now i get results like the following:
Code: Select all
<!-- 1 -->
<!-- 2 -->
<!-- 3 -->
<img src="http://www.google.com/http://www.google.com/images/logo_sm.gif" width=150 height=55 alt=Google border=0 vspace=12>
so the first question is of course: is it really the way it's meant to be?? and my second question is, is there an easy way to prevent this? of course, i can simply write a loop that goes through the HTML code and applies only 1 replacement for each image, but i was just wondering how more experienced programmers would do...
thx in advance for your help!
pagod
PS: in case that might be relevant, i'm running PHP 5.2.6 (Zend Engine v2.2.0) on MacOS Leopard 10.5.6 (Intel) with Apache 2.2.9 installed.