Background:
I work at a multilingual communication company, where we’re working with quite a good CMS system. Since its last update, however, all exported or downloaded files from the system are ‘polluted’ with metadata. And I don’t want to see the metadata. Boo.
Situation:
To clean and to preprocess the files for further processing and translation, we use a couple of search & replace regexes. One of the preprocessing steps we apply on our files goes as follows:
Code: Select all
(?<!=)"\b(.+?)\b"(?! \[)Code: Select all
“1”Problem:
The metadata. All of a sudden, all our files are clutterd with – among others – “concept.dtd” and “map.dtd”. As these metadata are part of the file, I don’t want to replace their quotation marks in order not to change anything crucial. With the existing regex, they will get replaced.
So I tried rewriting it, and after a lot of trials & errors, this is what I came up with:
Code: Select all
(?<!=)”\b(.+?[\.d])\b”(?! \[)Help?
What am I missing or doing wrong? I've tried
Code: Select all
(?<!=)”\b(.+?[\.dtd])\b”(?! \[)Thank you very very much for any advice
edit: corrected wrong closing bracket