Page 1 of 2

Link removal (JS Regex)

Posted: Thu Jun 01, 2006 4:41 am
by JayBird
I was just messing around with a simple greasemonkey script to remove some links

The script looks like this

Code: Select all

var pglinks = document.body.innerHTML;
				
result = pglinks.replace(/<a href="(.*?)" target="_blank"><font color="#000000">(.*?)<\/font><\/a>/ig, "$2")

document.body.innerHTML = result;
This seems to work, unless there is a link inside a link such as this

Code: Select all

<a href="https://wwwa.applyonlinenow.com/" target="_blank">https://wwwa.applyonlinenow.com/</a><br />_________________<br />"Any purchases made by me are on the grounds that I own the original, if not the <a href="http://www.someurl.co.uk" target="_blank"><font color="#000000">backup</font></a> will be destroyed within 24 hours"
The output is

Code: Select all

will be destroyed within 24 hours"
when i am wanting just the url on the word "backup" removed. Note: the word 'backup' could be any word and the URL could be any URL

Code: Select all

<a href="https://wwwa.applyonlinenow.com/" target="_blank">https://wwwa.applyonlinenow.com/</a><br />_________________<br />"Any purchases made by me are on the grounds that I own the original, if not the backup will be destroyed within 24 hours"
Any ideas?

Thanks

Posted: Thu Jun 01, 2006 6:04 am
by Weirdan
wouldn't it be easier to search for A nodes through the DOM tree?

Posted: Thu Jun 01, 2006 6:24 am
by JayBird
Weirdan wrote:wouldn't it be easier to search for A nodes through the DOM tree?
I dunno, i suck at JS :roll:

Any pointers!?

Posted: Thu Jun 01, 2006 7:26 am
by Weirdan
if that's for greasemonkey, the easiest way would be to use xpath query, something along the lines of:

Code: Select all

var links = document.evaluate("//a", document, NULL, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);

for(var i = 0, l = links.snapshotLength; i < l; i  ) {
   var link = links.snapshotItem(i);
   // do something to link...
}
or you could use getElementsByTagName:

Code: Select all

var links = document.getElementsByTagName("a");

for(var i =0, l=; i < links.length; i  ) {
  var link = links[i];
  // do something to link
}

Posted: Thu Jun 01, 2006 7:33 am
by Chris Corbyn
Pimptastic wrote:
Weirdan wrote:wouldn't it be easier to search for A nodes through the DOM tree?
I dunno, i suck at JS :roll:

Any pointers!?
You can get an array of 'A' nodes by doing:

Code: Select all

var aNodes = document.getElementsByTagName('a'); //This will be an array of nodes
EDIT | Terribly sorry - DevNet seems to be going up and down and this has just submitted now!

Posted: Thu Jun 01, 2006 7:41 am
by Weirdan
d11 wrote:EDIT | Terribly sorry - DevNet seems to be going up and down and this has just submitted now!
Yeah, looks like something is going wrong... Pimptastic? :)

Posted: Thu Jun 01, 2006 8:06 am
by JayBird
Nice one, gives me a starting point to work something up.

As for the server, same issues here, no idea what is going on. Just one of those things i guess/hope

Posted: Thu Jun 01, 2006 8:19 am
by Chris Corbyn
I tend do my loops over entire arrays like this cos it's more like a foreach.

Code: Select all

for (var i in theArray)
{
    //do stuff with theArray[i]
}

Posted: Thu Jun 01, 2006 8:46 am
by Weirdan
but if your Array proto-object is extended with some user defined methods (like forEach), in IE they would show up as array elements. Not applicable here, but nevertheless good to know.

Posted: Thu Jun 01, 2006 9:39 am
by JayBird
Thinking about it, how is going through all the links going to help me.

I am looking to replace:

Code: Select all

<a href="http://www.someurl.co.uk" target="_blank"><font color="#000000">backup</font></a>
with

Code: Select all

backup
i.e. completely removing the 'a' and the 'font' tags from these words. http://www.someurl.co.uk could be any url and the word 'backup' could be any word

Posted: Thu Jun 01, 2006 9:51 am
by Weirdan
let me clarify:
you need to find every link with target attribute equal to "_blank" that contain font element and remove the link together with font subelement, preserving the content of the font subelement.

Is it right?

Posted: Thu Jun 01, 2006 9:55 am
by JayBird
Weirdan wrote:let me clarify:
you need to find every link with target attribute equal to "_blank" that contain font element and remove the link together with font subelement, preserving the content of the font subelement.

Is it right?
Yes, nearly correct.

The target element will always be there, but it isn't a unique factor. Other links could have target="_blank" too

Not forgetting that http://www.someurl.co.uk could be any URL

Posted: Thu Jun 01, 2006 10:03 am
by Weirdan
but it isn't a unique factor
But what are unique factors? The fact the link does contain font element?
Not forgetting that http://www.someurl.co.uk could be any URL
that means "every link" in my book... we need to restrict the filter further by some criteria, don't we?

Posted: Thu Jun 01, 2006 10:07 am
by JayBird
Weirdan wrote:But what are unique factors? The fact the link does contain font element?
Yes, i guess the unique factor is the font tag
Weirdan wrote:that means "every link" in my book... we need to restrict the filter further by some criteria, don't we?
True.


So, basically, we are looking for a word that is wrapped inside a font tag and an anchor...then remove both tags

Posted: Thu Jun 01, 2006 10:39 am
by Weirdan
well, it could be something like this

Code: Select all

var links = document.evaluate(
   "//a[count(font)=1 and @target]", // select every A elt which have exactly one FONT subelt and target attribute
   document, 
   null, 
   XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, 
   null
);

for(var i = 0, l = links.snapshotLength; i < l; i++) {
   var link = links.snapshotItem(i);
   var text = link.firstChild.innerHTML; // this should be the content of font subelement
   link.parentNode.replaceChild(document.createTextNode(text), link);
}