Cleaning Up Repeated Information in an XML file.

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
terra
Forum Newbie
Posts: 9
Joined: Sat Feb 07, 2009 1:06 pm

Cleaning Up Repeated Information in an XML file.

Post by terra »

this is my first post to this thriving forum. in advance, i'd like to say thank you for your patience and help.

I have an XML file with contact information exported from my phone. Within the <body> container of each contact, there is some duplicate information, which shows up 2, 3 or even 4 times in some cases. for example:

Code: Select all

     <body>Met at LCB 08 on monday
------------------------------------------------------------------
Met at LCB 08 on monday
------------------------------------------------------------------
Met at LCB 08 on monday</body>
the repeated information is always separated by the following delimiter string (66 dashes) "------------------------------------------------------------------"

which perl style regex would i use to delete everything (including newlines and except multiples of 6 dashes) between the first delimiter string and the first occurrence of </body> (of course i could also delete everything between <body> and the last occurrence of the delimiter string).

i tried this but it didnt work:

Code: Select all

<body>([\w\s\d ,@\.\(\)\:\$#]*)------------------------------------------------------------------.</body>
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Cleaning Up Repeated Information in an XML file.

Post by prometheuzz »

terra wrote:... (of course i could also delete everything between <body> and the last occurrence of the delimiter string).
...
Yes, I'd do that as well.
Post Reply