Need Search & Replace Help (Regex Newbie)

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
GeoBear
Forum Newbie
Posts: 12
Joined: Tue May 12, 2009 4:31 pm

Need Search & Replace Help (Regex Newbie)

Post by GeoBear »

I'm using Dreamweaver's search-and-replace function to convert a website that includes hundreds of pages with links to pages focusing on the world's nations.

Consider the following links:

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a></td>
<td><a href="http://www.geoworld.org/zi" title="">Zimbabwe</a></td>
 
Notice that they're exactly the same except for the two characters following the domain name (sp and zi) and the names visitors see (Spain and Zimbabwe). I want to replace the two characters with the country's name, and I also want to insert the country's name in the title attribute, so the finished URL's will look like this:

Code: Select all

 
<td><a href="http://www.geoworld.org/Spain" title="Spain">Spain</a></td>
<td><a href="http://www.geoworld.org/Zimbabwe" title="Zimbabwe">Zimbabwe</a></td>
 
Does anyone know how to make a regex script like that?

One more detail...can you adjust it so that spaces in place names that consist of more than one word are replaced by underscores in the link, as follows? (If not, don't worry about it. I can probably fix that with a second regex that simply replaces spaces with underscores in links.)

Code: Select all

 
<td><a href="http://www.geoworld.org/United_Kingdom" title="United Kingdom">United Kingdom</a></td>
 
And if my original request is too difficult, I'd settle for a regex that converts this...

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a></td>
 
to this...

Code: Select all

 
<td><a href="http://www.geoworld.org/sp" title="">Spain</a>Spain2Spain3</td>
 
...or something similar. If I could merely replicate the place name, with different characters after each occurrence, then I could fill in the blanks with a series of search and replace operations.

I'm playing with a software program called RegExhibit and have learned that I can match everything between the tags with this regular expression: title="".*>

However, I don't have a clue about manipulating the data I've matched.

Thanks!
Last edited by Benjamin on Tue May 12, 2009 5:35 pm, edited 1 time in total.
Reason: Fixed code tags.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Need Search & Replace Help (Regex Newbie)

Post by prometheuzz »

Regex pattern:

Code: Select all

(<a\s+href="[^"]*/)[^/]*"[^>]*>([^<]+)
Replacement string (depending on what works in Dreamweaver):

Code: Select all

$1$2 title="$2">$2
or

Code: Select all

\1\2 title="\2">\2
The replacement of the spaces into underscores can't be done in the same replacement. You need a second replacement pattern for that. I leave that as an exercise to you. Feel free to post back with a specific question if you run into problems. Also post you attempt here and explain in which cases your regex fails.

Good luck.
GeoBear
Forum Newbie
Posts: 12
Joined: Tue May 12, 2009 4:31 pm

Re: Need Search & Replace Help (Regex Newbie)

Post by GeoBear »

Wow, that's awesome! I used the first replacement code, and it worked perfectly except that the finished URL's lack the second quote...

<a href="http://www.geoworld.org/United Kingdom title="United Kingdom">

I can easily fix that with a simple search and replace, replacing (space)title= with "(space"title=

Nevertheless, if you happen to know how to modify the regex so that the second quote is included, that would be cool.

Thanks so much for your help. This will save me literally hours of work.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Need Search & Replace Help (Regex Newbie)

Post by prometheuzz »

(removed previous post)

Edit, I now see what you mean. Change

Code: Select all

'$1$2 title="$2">$2'
into:

Code: Select all

'$1$2" title="$2">$2'
and the quote is fixed.
GeoBear
Forum Newbie
Posts: 12
Joined: Tue May 12, 2009 4:31 pm

Re: Need Search & Replace Help (Regex Newbie)

Post by GeoBear »

Perfect. Thanks again.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Need Search & Replace Help (Regex Newbie)

Post by prometheuzz »

GeoBear wrote:Perfect. Thanks again.
You're welcome.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Need Search & Replace Help (Regex Newbie)

Post by onion2k »

Why are you using the first two letters of the country name? What happens, for example, with Sweden and Switzerland, or China and Chad, or Australia and Austria? You should use the proper ISO 3166 country codes. (http://www.iso.org/iso/english_country_ ... e_elements). Though, obviously, that'd be a lot more work to hack in now I suppose.
GeoBear
Forum Newbie
Posts: 12
Joined: Tue May 12, 2009 4:31 pm

Re: Need Search & Replace Help (Regex Newbie)

Post by GeoBear »

onion2k wrote:Why are you using the first two letters of the country name? What happens, for example, with Sweden and Switzerland, or China and Chad, or Australia and Austria? You should use the proper ISO 3166 country codes. (http://www.iso.org/iso/english_country_ ... e_elements). Though, obviously, that'd be a lot more work to hack in now I suppose.
Actually, I'm already using ISO codes as ID's for each of the world's nations. I'm converting the CIA's World Factbook for my website, and they appear to use a different system - I think it's called FIPS, or something like that. I'm going to add it to my database, so I can link to their maps, which are similarly named.
Post Reply