Conditional and Lookbehind
Posted: Sun Jun 08, 2008 5:11 am
Hello Forum,
I am writing a regex that concatenates several words into one character string. I am using conditional and lookbehind constructions but I keep getting unexpected results. Could anyone help me understand what is going on?
The input for the regex is a few words:
yom sase rare imas ita
The regex should concatenate them with the following rules:
1. The first consonant of a word should be deleted if the word immediately before ends with a consonant. E.g., yom sase —> yomase.
2. The first vowel of a word should be deleted if the word immediately before ends with a vowel. E.g., tabe imasu —> tabemasu.
Since vowels and consonants are mutually exclusive, I thought I could define them as [aeiou] and [^aeiou], and use a regex conditional to express the two rules by one regex. Unfortunately, my regex below matches more than necessary.
For yom sase rare imas ita,
matches
[!MATCH!]om[!MATCH!][!MATCH!]ase[!MATCH!][!MATCH!]are[!MATCH!][!MATCH!]mas[!MATCH!][!MATCH!]ta
where the matched portions are replaced by [!MATCH!].
What I want to match in the same convention would be:
yom [!MATCH!]ase rare [!MATCH!]mas ita
THANK YOU in advance for comments!
I am writing a regex that concatenates several words into one character string. I am using conditional and lookbehind constructions but I keep getting unexpected results. Could anyone help me understand what is going on?
The input for the regex is a few words:
yom sase rare imas ita
The regex should concatenate them with the following rules:
1. The first consonant of a word should be deleted if the word immediately before ends with a consonant. E.g., yom sase —> yomase.
2. The first vowel of a word should be deleted if the word immediately before ends with a vowel. E.g., tabe imasu —> tabemasu.
Since vowels and consonants are mutually exclusive, I thought I could define them as [aeiou] and [^aeiou], and use a regex conditional to express the two rules by one regex. Unfortunately, my regex below matches more than necessary.
For yom sase rare imas ita,
Code: Select all
/(?(?<=\B[^aeiou]\b ))\b[^aeiou]|\b[aeiou]/[!MATCH!]om[!MATCH!][!MATCH!]ase[!MATCH!][!MATCH!]are[!MATCH!][!MATCH!]mas[!MATCH!][!MATCH!]ta
where the matched portions are replaced by [!MATCH!].
What I want to match in the same convention would be:
yom [!MATCH!]ase rare [!MATCH!]mas ita
THANK YOU in advance for comments!