Page 1 of 1
How to except something?
Posted: Sat Jun 27, 2009 11:41 am
by czech_d3v3l0p3r
Hi.
I have the following text:
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar<ref>Barbaz</ref>. Text<ref>Something</ref>. And<ref>more</ref>.
and I'd like to move the periods in front of the ref tag, where it's after it. I have tried the following:
Code: Select all
preg_replace( "@\.?\s*(<ref.*?/ref>)\s*\.@", ".$1", $text )
it has worked in a few cases, but removed the dots completely in other cases, so it simply didn't work correctly. The result was this:
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar<ref>Barbaz</ref> Text.<ref>Something</ref> And.<ref>more</ref>
So my idea was that all I have to do is except "</ref> from inside of the tags, but I really don't know how. Maybe there are other ways I haven't thought of, if you would accidentaly know about any, don't hesitate to write here.
Thanks in advance for any suggestions.
Re: How to except something?
Posted: Sun Jun 28, 2009 4:07 pm
by prometheuzz
czech_d3v3l0p3r wrote:Hi.
I have the following text:
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar<ref>Barbaz</ref>. Text<ref>Something</ref>. And<ref>more</ref>.
and I'd like to move the periods in front of the ref tag, where it's after it. I have tried the following:
Code: Select all
preg_replace( "@\.?\s*(<ref.*?/ref>)\s*\.@", ".$1", $text )
it has worked in a few cases, but removed the dots completely in other cases, so it simply didn't work correctly. The result was this:
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar<ref>Barbaz</ref> Text.<ref>Something</ref> And.<ref>more</ref>
...
And how is that wrong? In other words, can you post the output that you were hoping to produce?
Re: How to except something?
Posted: Mon Jun 29, 2009 2:21 am
by czech_d3v3l0p3r
prometheuzz wrote:
And how is that wrong? In other words, can you post the output that you were hoping to produce?
I expected it to produce
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar.<ref>Barbaz</ref> Text.<ref>Something</ref> And.<ref>more</ref>
Notice the period after "Foobar", with the regex it's somehow lost.
Re: How to except something?
Posted: Mon Jun 29, 2009 5:38 am
by prometheuzz
czech_d3v3l0p3r wrote:prometheuzz wrote:
And how is that wrong? In other words, can you post the output that you were hoping to produce?
I expected it to produce
Code: Select all
Foo.<ref>Bar</ref> Baz. Foobar.<ref>Barbaz</ref> Text.<ref>Something</ref> And.<ref>more</ref>
Notice the period after "Foobar", with the regex it's somehow lost.
Try making that last period (dot) also optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)\s*\.?@", ".$1", $text)
And to preserve some of the white spaces, you could group them with that last period and then make that group optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)(\s*\.)?@", ".$1", $text)
Re: How to except something?
Posted: Tue Jun 30, 2009 3:04 am
by czech_d3v3l0p3r
prometheuzz wrote:
Try making that last period (dot) also optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)\s*\.?@", ".$1", $text)
And to preserve some of the white spaces, you could group them with that last period and then make that group optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)(\s*\.)?@", ".$1", $text)
Well, that would work in most cases, but what about something more complex, like
Code: Select all
December.<ref>Kachna</ref> America. August<ref>Bear</ref>. Guinea: "May."<ref>Foo</ref> June<ref>Boo</ref>. Feb<ref>Bar</ref>.
March.<ref>Cat</ref> Sep<ref>Dog</ref>. Jamaica, Something.<ref>Chemong</ref>. Text<ref>Deer</ref>, Math<ref>Reference</ref>, third<ref>Mouse</ref> and next<ref>Cow</ref>, July<ref>Err</ref>, April<ref>Donkey</ref> and Oct<ref>Bird</ref>.
which produces
Code: Select all
December.<ref>Kachna</ref> America. August.<ref>Bear</ref> Guinea: "May.".<ref>Foo</ref> June.<ref>Boo</ref> Feb.<ref>Bar</ref>
March.<ref>Cat</ref> Sep.<ref>Dog</ref> Jamaica, Something.<ref>Chemong</ref> Text.<ref>Deer</ref>, Math.<ref>Reference</ref>, third.<ref>Mouse</ref> and next.<ref>Cow</ref>, July.<ref>Err</ref>, April.<ref>Donkey</ref> and Oct.<ref>Bird</ref>
but I'd expect something more like
Code: Select all
December.<ref>Kachna</ref> America. August.<ref>Bear</ref> Guinea: "May."<ref>Foo</ref> June.<ref>Boo</ref> Feb.<ref>Bar</ref>
March.<ref>Cat</ref> Sep.<ref>Dog</ref> Jamaica, Something.<ref>Chemong</ref> Text,<ref>Deer</ref> Math,<ref>Reference</ref> third<ref>Mouse</ref> and next,<ref>Cow</ref> July,<ref>Err</ref> April<ref>Donkey</ref> and Oct.<ref>Bird</ref>
I know it'll probably require some more code to support the commas. Any thoughts?
Re: How to except something?
Posted: Tue Jun 30, 2009 3:25 am
by prometheuzz
czech_d3v3l0p3r wrote:prometheuzz wrote:
Try making that last period (dot) also optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)\s*\.?@", ".$1", $text)
And to preserve some of the white spaces, you could group them with that last period and then make that group optional:
Code: Select all
preg_replace("@\.?\s*(<ref.*?/ref>)(\s*\.)?@", ".$1", $text)
Well, that would work in most cases, but what about something more complex, like
...
Well, of course it doesn't work. I mean, you didn't mention anything about other kind of characters in your original post.
Before answering, could you please indicate how the following example strings should be transformed?
Code: Select all
'aaa <ref>AAA</ref>. aaa' // period
'bbb .<ref>BBB</ref> bbb' // period
'ccc .<ref>CCC</ref>. ccc' // period + period
'ddd , <ref>DDD</ref> ddd' // comma
'eee <ref>EEE</ref> , eee' // comma
'fff , <ref>FFF</ref> , fff' // comma + comma
'ggg .<ref>GGG</ref>, ggg' // period + comma!
'hhh <ref>HHH</ref>? hhh' // question mark
'iii <ref>III</ref> iii' // no punctuation marks
I presume you're only interested in punctuation marks, correct? If not, what
are the characters you're interested in?
Re: How to except something?
Posted: Tue Jun 30, 2009 12:03 pm
by czech_d3v3l0p3r
prometheuzz wrote:
Well, of course it doesn't work. I mean, you didn't mention anything about other kind of characters in your original post.
Before answering, could you please indicate how the following example strings should be transformed?
Code: Select all
'aaa <ref>AAA</ref>. aaa' // period
'bbb .<ref>BBB</ref> bbb' // period
'ccc .<ref>CCC</ref>. ccc' // period + period
'ddd , <ref>DDD</ref> ddd' // comma
'eee <ref>EEE</ref> , eee' // comma
'fff , <ref>FFF</ref> , fff' // comma + comma
'ggg .<ref>GGG</ref>, ggg' // period + comma!
'hhh <ref>HHH</ref>? hhh' // question mark
'iii <ref>III</ref> iii' // no punctuation marks
I presume you're only interested in punctuation marks, correct? If not, what
are the characters you're interested in?
Well, originally I was only interested in the period, but recently I've discovered that sometimes it's required to fix it also in case of other punctation characters. However support for period would be absolutely OK.
Now to the string:
Code: Select all
'aaa <ref>AAA</ref>. aaa' => 'aaa.<ref>AAA</ref> aaa'
'bbb .<ref>BBB</ref> bbb' => 'bbb .<ref>BBB</ref> bbb' // it doesn't really matter whether the space before the period will be there
'ccc .<ref>CCC</ref>. ccc' => 'ccc .<ref>CCC</ref> ccc' // same as above
'ddd , <ref>DDD</ref> ddd' => 'ddd ,<ref>DDD</ref> ddd' // same
'eee <ref>EEE</ref> , eee' => 'eee,<ref>EEE</ref> eee'
'fff , <ref>FFF</ref> , fff' => 'fff,<ref>FFF</ref> fff'
'ggg .<ref>GGG</ref>, ggg' => 'ggg,<ref>GGG</ref> ggg' // this is really tricky, I'm not sure about this one; it would probably depend on whether the first character of the last "ggg" group is capital or not
'hhh <ref>HHH</ref>? hhh' => 'hhh?<ref>HHH</ref> hhh'
'iii <ref>III</ref> iii' // this is ok, maybe just remove the space before the ref tag, but that doesn't really matter
Re: How to except something?
Posted: Tue Jun 30, 2009 1:37 pm
by prometheuzz
czech_d3v3l0p3r wrote:Well, originally I was only interested in the period, but recently I've discovered that sometimes it's required to fix it also in case of other punctation characters.
You said "that would work in most cases" about my proposed solution which suggests that some of the cases in your original post went wrong, which was not the case.
czech_d3v3l0p3r wrote:Code: Select all
'ggg .<ref>GGG</ref>, ggg' => 'ggg,<ref>GGG</ref> ggg' // this is really tricky, I'm not sure about this one; it would probably depend on whether the first character of the last "ggg" group is
If you're not sure about what your requirements are, I cannot answer you. Feel free to post back when you
are sure about it.
Re: How to except something?
Posted: Wed Jul 01, 2009 3:14 am
by czech_d3v3l0p3r
prometheuzz wrote:
If you're not sure about what your requirements are, I cannot answer you. Feel free to post back when you are sure about it.
Ok, so all I want it to do is to detect those cases:
and move the punctation character before the
*whole* ref tag. Is is understandable now?
Re: How to except something?
Posted: Wed Jul 01, 2009 3:35 am
by prometheuzz
czech_d3v3l0p3r wrote:prometheuzz wrote:
If you're not sure about what your requirements are, I cannot answer you. Feel free to post back when you are sure about it.
Ok, so all I want it to do is to detect those cases:
and move the punctation character before the
*whole* ref tag. Is is understandable now?
Err, \s{,3} is not valid. Not sure what you mean by that post... Could you just explain it in English instead?
But also please answer the question from my previous reply:
What should this be replaced with:
?
Re: How to except something?
Posted: Wed Jul 01, 2009 5:59 am
by czech_d3v3l0p3r
prometheuzz wrote:
Err, \s{,3} is not valid. Not sure what you mean by that post... Could you just explain it in English instead?
But also please answer the question from my previous reply:
What should this be replaced with:
?
I mean it should match
Code: Select all
</ref>.
</ref> .
</ref> .
</ref> .
</ref>!
</ref> !
</ref> !
</ref> !
</ref>?
</ref> ?
</ref> ?
</ref> ?
and move the punctation in front of the ref tag. Also
shouldn't be replaced with anything, just leave it as it is, because the period is correct. The regex doesn't have to care about other characters than just the above noted. If it's needed ignore everything I've previously said. I just need to move the <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span> dot/exclamation/question mark before the tag.
Re: How to except something?
Posted: Wed Jul 01, 2009 6:39 am
by prometheuzz
czech_d3v3l0p3r wrote:...
... Also
shouldn't be replaced with anything, just leave it as it is, because the period is correct.
Rather vague statements (because the period is correct). What does that last part even mean? If it wasn't a period, but some other punctuation mark, would it also have been correct? Is a period treated differently than other punctuation marks? What is correct?
czech_d3v3l0p3r wrote:The regex doesn't have to care about other characters than just the above noted. If it's needed ignore everything I've previously said. I just need to move the <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span> dot/exclamation/question mark before the tag.
Err, in case of two different punctuation marks:
you don't want anything to change, but with two the same punctuation marks:
Code: Select all
'ggg .<ref>GGG</ref>. ggg' // earlier you said you wanted to change it to: ccc .<ref>CCC</ref> ccc
the latter dot is removed.
As you can see, it is very important to describe your requirements very strictly. If you don't do that, all I can do is speculate which I will not do.
There's no need to answer my question in this reply. I only asked them to underline how your statements are open to interpretation. Perhaps someone else is able to give you a hand with it.
Best of luck.
Re: How to except something?
Posted: Wed Jul 01, 2009 10:43 am
by czech_d3v3l0p3r
prometheuzz wrote:czech_d3v3l0p3r wrote:...
... Also
shouldn't be replaced with anything, just leave it as it is, because the period is correct.
Rather vague statements (because the period is correct). What does that last part even mean? If it wasn't a period, but some other punctuation mark, would it also have been correct? Is a period treated differently than other punctuation marks? What is correct?
czech_d3v3l0p3r wrote:The regex doesn't have to care about other characters than just the above noted. If it's needed ignore everything I've previously said. I just need to move the <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span> dot/exclamation/question mark before the tag.
Err, in case of two different punctuation marks:
you don't want anything to change, but with two the same punctuation marks:
Code: Select all
'ggg .<ref>GGG</ref>. ggg' // earlier you said you wanted to change it to: ccc .<ref>CCC</ref> ccc
the latter dot is removed.
As you can see, it is very important to describe your requirements very strictly. If you don't do that, all I can do is speculate which I will not do.
There's no need to answer my question in this reply. I only asked them to underline how your statements are open to interpretation. Perhaps someone else is able to give you a hand with it.
Best of luck.
K, thanks for your time.
Re: How to except something?
Posted: Thu Jul 09, 2009 3:40 pm
by ridgerunner
On the the case of handling periods only...
Your original regex is almost, but not quite right. The reason why the dot is mysteriously removed from the second component ("Foobar<ref>Barbaz</ref>."), is because your regex erroneously matches the first component by swallowing up both the first and second component - (i.e. it does not match the first closing </ref> because it is not followed by a period), however, the lazy *.? does not give up looking for the closing tag of the first <ref>, and continues on swallowing up the second component where it does finally find a period after the closing tag. Thus, the period from the end of the second component was moved ahead of the first component (replacing the one that was already there.)
To fix the regex, you need to make the '.*?/ref>' sub expression
ATOMIC (i.e. '(?>.*?/ref>)') so that it can only consume one matching </ref>, then give up if there is no period. Here is a regex that works a little bit better for your "period only" case...
Code: Select all
preg_replace( "@\.?\s*(<ref(?>.*?/ref>))\s*\.@", ".$1", $text )
This regex correctly fails to match the first component, then correctly fixes the second and following components. Similar regexs can be run to handle the other punctuation cases.
Hope this helps!
