Page 1 of 1

Cleaning Up Repeated Information in an XML file.

Posted: Sat Feb 07, 2009 1:19 pm
by terra
this is my first post to this thriving forum. in advance, i'd like to say thank you for your patience and help.

I have an XML file with contact information exported from my phone. Within the <body> container of each contact, there is some duplicate information, which shows up 2, 3 or even 4 times in some cases. for example:

Code: Select all

     <body>Met at LCB 08 on monday
------------------------------------------------------------------
Met at LCB 08 on monday
------------------------------------------------------------------
Met at LCB 08 on monday</body>
the repeated information is always separated by the following delimiter string (66 dashes) "------------------------------------------------------------------"

which perl style regex would i use to delete everything (including newlines and except multiples of 6 dashes) between the first delimiter string and the first occurrence of </body> (of course i could also delete everything between <body> and the last occurrence of the delimiter string).

i tried this but it didnt work:

Code: Select all

<body>([\w\s\d ,@\.\(\)\:\$#]*)------------------------------------------------------------------.</body>

Re: Cleaning Up Repeated Information in an XML file.

Posted: Sun Feb 08, 2009 2:39 pm
by prometheuzz
terra wrote:... (of course i could also delete everything between <body> and the last occurrence of the delimiter string).
...
Yes, I'd do that as well.