Page 1 of 1

Need Search & Replace Help (Regex Newbie)

Posted: Tue May 12, 2009 4:35 pm
by GeoBear
I'm using Dreamweaver's search-and-replace function to convert a website that includes hundreds of pages with links to pages focusing on the world's nations.

Consider the following links:

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a></td>
<td><a href="http://www.geoworld.org/zi" title="">Zimbabwe</a></td>
 
Notice that they're exactly the same except for the two characters following the domain name (sp and zi) and the names visitors see (Spain and Zimbabwe). I want to replace the two characters with the country's name, and I also want to insert the country's name in the title attribute, so the finished URL's will look like this:

Code: Select all

 
<td><a href="http://www.geoworld.org/Spain" title="Spain">Spain</a></td>
<td><a href="http://www.geoworld.org/Zimbabwe" title="Zimbabwe">Zimbabwe</a></td>
 
Does anyone know how to make a regex script like that?

One more detail...can you adjust it so that spaces in place names that consist of more than one word are replaced by underscores in the link, as follows? (If not, don't worry about it. I can probably fix that with a second regex that simply replaces spaces with underscores in links.)

Code: Select all

 
<td><a href="http://www.geoworld.org/United_Kingdom" title="United Kingdom">United Kingdom</a></td>
 
And if my original request is too difficult, I'd settle for a regex that converts this...

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a></td>
 
to this...

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a>Spain2Spain3</td>
 
...or something similar. If I could merely replicate the place name, with different characters after each occurrence, then I could fill in the blanks with a series of search and replace operations.

I'm playing with a software program called RegExhibit and have learned that I can match everything between the tags with this regular expression: title="".*>

However, I don't have a clue about manipulating the data I've matched.

Thanks!

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 4:13 am
by prometheuzz
Regex pattern:

Code: Select all

(<a\s+href="[^"]*/)[^/]*"[^>]*>([^<]+)
Replacement string (depending on what works in Dreamweaver):

Code: Select all

$1$2 title="$2">$2
or

Code: Select all

\1\2 title="\2">\2
The replacement of the spaces into underscores can't be done in the same replacement. You need a second replacement pattern for that. I leave that as an exercise to you. Feel free to post back with a specific question if you run into problems. Also post you attempt here and explain in which cases your regex fails.

Good luck.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 7:02 am
by GeoBear
Wow, that's awesome! I used the first replacement code, and it worked perfectly except that the finished URL's lack the second quote...

<a href="http://www.geoworld.org/United Kingdom title="United Kingdom">

I can easily fix that with a simple search and replace, replacing (space)title= with "(space"title=

Nevertheless, if you happen to know how to modify the regex so that the second quote is included, that would be cool.

Thanks so much for your help. This will save me literally hours of work.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 7:29 am
by prometheuzz
(removed previous post)

Edit, I now see what you mean. Change

Code: Select all

'$1$2 title="$2">$2'
into:

Code: Select all

'$1$2" title="$2">$2'
and the quote is fixed.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 9:21 am
by GeoBear
Perfect. Thanks again.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 9:33 am
by prometheuzz
GeoBear wrote:Perfect. Thanks again.
You're welcome.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 9:47 am
by onion2k
Why are you using the first two letters of the country name? What happens, for example, with Sweden and Switzerland, or China and Chad, or Australia and Austria? You should use the proper ISO 3166 country codes. (http://www.iso.org/iso/english_country_ ... e_elements). Though, obviously, that'd be a lot more work to hack in now I suppose.

Re: Need Search & Replace Help (Regex Newbie)

Posted: Wed May 13, 2009 5:35 pm
by GeoBear
onion2k wrote:Why are you using the first two letters of the country name? What happens, for example, with Sweden and Switzerland, or China and Chad, or Australia and Austria? You should use the proper ISO 3166 country codes. (http://www.iso.org/iso/english_country_ ... e_elements). Though, obviously, that'd be a lot more work to hack in now I suppose.
Actually, I'm already using ISO codes as ID's for each of the world's nations. I'm converting the CIA's World Factbook for my website, and they appear to use a different system - I think it's called FIPS, or something like that. I'm going to add it to my database, so I can link to their maps, which are similarly named.