A “replace if not between” function

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

A “replace if not between” function

Post by LAttilaD »

Dears, I’m coming here with a problem I’m fighting for days now. I need a function to search for instances of a string (needle) in a longer string (haystack) and replace it with another string (replace), but only if needle is not between a pair of tags (starttag, endtag) in haystack. For example,

Code: Select all

$haystack='This is <a href=http://lattilad.org>an html formatted link</a> in an html text.';
$needle='html';
$replace='<a href=en.wikipedia.org/wiki/HTML>HTML</a>';
$starttag='<a ';
$endtag='>';
Now, I'm expecting the function to convert “html” in haystack to the link showed as replace, but only at the second instance, where it isn’t in a link already.
In the real application, I have to take three pairs of starttag–endtag in account: angle brackets to avoid replacing html code, A tags to avoid replacing text that is a link already (our purpose is to create links in a text automatically) and a special BBCode link pair to be created to suppress link creation.
What makes things harder is that, as you can see in the example, starttag and endtag may precede and follow needle from a distance, so searching for something like $starttag.$needle.$endtag won’t work.
I’ll appreciate any help. Thank you.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: A “replace if not between” function

Post by requinix »

Is your problem more specific than what you've said? Is it that you need to find all <A> tags and do a find/replace in their text? Or is this really as generic as it sounds?
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

Thank you for interesting. What I’m trying to create is an autolinking feature for a blog engine plugin that creates hypertext blog entries. At the point where I was stuck, I have a list of subpage titles, for example, “Title Page”, “Table of Contents”, “Introduction” or anything the user (a blogger) wants. And I have the text of the subpage to be displayed. It may contain any of the subpage titles mentioned, in any number of occurrences, and so on. We have to turn these titles to links that point to the appropriate subpages.
But of course, the user may give subpage titles like “img” what is the same as an HTML tag’s name (e.g. for a page written about that very tag). So we must exclude the inside of HTML and BBCode tags from replacing. Additionally, the user may give subpage titles that appear in link texts, for example, a text may look like “<a href=devnetwork.net>This is a link pointing to Devnetwork.net</a>” and there may exist a subpage called “pointing”, and replacing the word at that place would result in a broken code. And finally, I want to add a [skipautolink] BBCode tag to mark parts of text not to be subjected to the replacement.
I’m working on text processing tasks for ages, but I’m still an amateur programmer, and in this case the number of cases to be handled is beyond my limited abilities…
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: A “replace if not between” function

Post by Christopher »

You may want to preg_match for starttag.*needle.*endtag first and if found then replace needle.
(#10850)
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

Look at the starting example, Christopher. If I search for “html” enclosed in an <A> tag and replace it, it results in a broken code. What I’m doing is to replace a piece of normal text with a link, therefore if that piece of normal text is part of a link already, it will found itself inside two <A> tags…
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

I’ve created another example, step up a level. (Text is from Wikipedia.)

Code: Select all

$text='Bears are mammals of the <a href="http://en.wikipedia.org/wiki/Family_(biology)">family</a> Ursidae. Bears are classified as caniforms, or doglike carnivorans, with the pinnipeds being their closest living relatives. Although there are only eight living species of bear, they are widespread, appearing in a wide variety of habitats throughout the Northern Hemisphere and partially in the Southern Hemisphere. Bears are found in the continents of North America, South America, Europe, and Asia.';

$linkify=array('mammal', 'family', 'caniform', 'species', 'habitat', 'northern hemisphere', 'southern hemisphere', 'hemisphere', 'continent', 'america', 'north america', 'south america', 'europe', 'asia');
It’s a simplified version of what I have, but shows the problem well. I have to go through $linkify and examine each elements in $text. If any is found, I have to replace it with a link pointing to the appropriate page. But one of the words, “family" is a link already. It appears twice in $text, once in an HTML tag (…wiki/Family_…), and once as a link text (…">family</a…). They must be skipped from replacing, or the resulting HTML code will be broken.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: A “replace if not between” function

Post by requinix »

With this type of problem I generally resort to breaking the content into parts: some parts you want to do replacements on, other parts you do not.

Code: Select all

$content = "Go to example.com or click this link: <a href='http://www.example.com'>example.com</a>";

// break the content into alternating parts
$parts = preg_split('#<a href=[^>]+>.*?</a>#i', $content, PREG_SPLIT_DELIM_CAPTURE);
$replace = true;
foreach ($parts as $key => $part) {
	// replace?
	if ($replace) {
		$part = str_replace("example.com", "<a href='http://www.example.com'>example.com</a>", $part);
	} else {
		// don't replace
	}

	$parts[$key] = $part;
	$replace = !$replace; // alternate
}

// piece them back together
$editedcontent = implode("", $parts);
For HTML specifically there are alternatives. For instance you can load the HTML into DOMDocument and recursively do a find/replace on text - skipping elements you don't want to look at (like <A> tags) as you go.
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

Oh. It took some rereads for me to get the point. :) Well, Requinix, this is a totally different approach than any one I tried. I’ll try it, but not today, and I’ll report what do I get. Thank you!
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

I tried it today, anyway. :) Very nice. Taking the first steps in applying it, I got results somewhat different from the expected, so I went to PHP documentation and discovered
[syntax]$parts = preg_split('#(<a [^>]+>.*?</a>)#i', $content, -1, PREG_SPLIT_DELIM_CAPTURE);[/syntax]
is the correct way to split the text; note the parentheses and -1 for the limit parameter.
LAttilaD
Forum Newbie
Posts: 7
Joined: Thu Oct 11, 2012 1:22 pm

Re: A “replace if not between” function

Post by LAttilaD »

Finally, I tied down all loose ends, and the results can be seen here.
Post Reply