Absolute Links on Same Line Regex Needed
Posted: Sat Apr 07, 2007 6:33 pm
Hello,
I am needing a regex that will stop at the end of the href tag and if two hrefs are on the same line it will pick up both.
I went to the regex library and tried this pattern that I got from the library(<(?:.*?)\s)href\s*=([\s"'])*/?([^\2:#]+?)\2((?:.*?)>)
against the following lines of html, note the lines are broken for purposes of the forum only
<b>Joints</b>: <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html">Rosemary</a></b>, <A href="/Peppermint_100_Pure_
Essential_Oil_p_47.html"><b>Peppermint</b></a>, <b><a href="/Cinnamon_Leaf_100_Pure_Essential_Oil_p_14.html">Cinnamon
</a></b>, <A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"> This is what it caught.
**********************
Marjoram</a></b> and <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html">Rosemary</a></b></p>
</a></b> and <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html"> This is what it caught.
***********************
<A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"><b>Peppermint</b></a>, <b><a href="/Cinnamon_Leaf_100_Pure_
Essential_Oil_p_14.html">Cinnamon
<A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"> This is what it caught. You can see it caught the first link and missed the second.
Many of my html pages have multiple links on the same line
All I am trying to do is take the a href tags contents and insert http://www.domain.com/ or http://www.domain.com with or
without the slash depending upon if the contents have the slash or not. I can make two passes on it if there is a slash or not, if that is easier.
I am new to regex and I would appreciate if you could help me get clear on why it is catching more content than I need and/or leaving an entire
link out that is on the same line
Thanks,
Randal
I am needing a regex that will stop at the end of the href tag and if two hrefs are on the same line it will pick up both.
I went to the regex library and tried this pattern that I got from the library(<(?:.*?)\s)href\s*=([\s"'])*/?([^\2:#]+?)\2((?:.*?)>)
against the following lines of html, note the lines are broken for purposes of the forum only
<b>Joints</b>: <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html">Rosemary</a></b>, <A href="/Peppermint_100_Pure_
Essential_Oil_p_47.html"><b>Peppermint</b></a>, <b><a href="/Cinnamon_Leaf_100_Pure_Essential_Oil_p_14.html">Cinnamon
</a></b>, <A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"> This is what it caught.
**********************
Marjoram</a></b> and <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html">Rosemary</a></b></p>
</a></b> and <b><a href="/Rosemary_100_Pure_Essential_Oil_p_53.html"> This is what it caught.
***********************
<A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"><b>Peppermint</b></a>, <b><a href="/Cinnamon_Leaf_100_Pure_
Essential_Oil_p_14.html">Cinnamon
<A href="/Peppermint_100_Pure_Essential_Oil_p_47.html"> This is what it caught. You can see it caught the first link and missed the second.
Many of my html pages have multiple links on the same line
All I am trying to do is take the a href tags contents and insert http://www.domain.com/ or http://www.domain.com with or
without the slash depending upon if the contents have the slash or not. I can make two passes on it if there is a slash or not, if that is easier.
I am new to regex and I would appreciate if you could help me get clear on why it is catching more content than I need and/or leaving an entire
link out that is on the same line
Thanks,
Randal