Page 1 of 1

NOT Group Extraction

Posted: Thu May 17, 2007 11:40 pm
by nwp
Hi!
I wanna extract these two words from the following string

Code: Select all

String anotherstring
provided String and anotherstring is NOT equal.
I've tried this regex

Code: Select all

(.*?) ([^\1]+?)
But It doesn't work.
e.g. from hello hello extract only the first hello but from hello hi extract both hello and hi.

Posted: Fri May 18, 2007 8:16 am
by feyd
Can you explain more?

Posted: Sat May 19, 2007 11:15 am
by aaronhall
Are you trying to extract unique words from a string? Can you give a full example of the input you expect to receive? If the words are always separated in the same way, and there isn't any punctuation to worry about, you can just use explode() to split the words into an array, and run it through array_unique().

Posted: Sat May 19, 2007 11:33 am
by nwp
What I want is
Text#1 : string string
Text#2 : hello hello
here regex will extract both of the hello and both of the string
text#3 : hello everybody
text#4 : string differentstring
here regex will extract Only hello not the everybody and it will extract the string not differentstring.
And to do that I've tried the following

Code: Select all

(.*?) ([\1]+?)
Here (.*?) will extract the First string
and [\1] is to check wheather its the same as the first extracted string.
But Its not working
And I also need the regex just to do the opposite of what I've mentioned above
And I was trying to use

Code: Select all

(.*?) ([^\1]+?)
But its also not working.
------------------
EDIT
And I want to do it using regex
I know how to do it without regex.

Posted: Sat May 19, 2007 12:08 pm
by Ollie Saunders

Code: Select all

<?php
function conditionalExplode($input)
{
    $explo = explode(' ', $input);
    if ($explo[0] != $explo[1]) {
        unset($explo[1]);
    }
    return $explo;
}
function conditionalMatch($input)
{
    $matches = array();
    preg_match('~(\w+)\s+(\1)?~', $input, $matches);
    unset($matches[0]);
    return array_values($matches);
}

assert_options(ASSERT_ACTIVE, true);
assert_options(ASSERT_WARNING, true);

assert(($out = conditionalMatch('string string')) == array('string', 'string')); var_dump($out);
assert(($out = conditionalMatch('hello hello')) == array('hello', 'hello')); var_dump($out);

assert(($out = conditionalMatch('hello everybody')) == array('hello')); var_dump($out);
assert(($out = conditionalMatch('string differentstring')) == array('string')); var_dump($out);

Posted: Sat May 19, 2007 12:15 pm
by nwp
I've told before that I want to do it just using Regex
nwp wrote:EDIT
And I want to do it using regex
I know how to do it without regex.

Posted: Sat May 19, 2007 1:03 pm
by Ollie Saunders
I wrote:~(\w+)\s+(\1)?~
Hmmm looks suspiciously like a regex

Posted: Sun May 20, 2007 12:20 am
by nwp
Thanks. that regex worked
It extracts both 'test'(s) from "test test"
But extracts only the first 'test' from "test text".
And now I want the regex that can do just the opposite of this regex.
e.g. It will extract both 'test' and 'text' from "test text"
and only the first 'test' from "test test".
I've tried regex but its not at all working.

Code: Select all

(\w+)\s+([^\1])?

Posted: Sun May 20, 2007 5:35 am
by Ollie Saunders
I've given this some thought and it can't be done without PHP code or the e modifier.
So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?

Posted: Sun May 20, 2007 5:51 am
by nwp
I dont need preg_replace() I just need to extract the strings with preg_match(). so here is no work of e modifier.
I belief there must be a way to do it with regex (AS THE LOGIC IS VERY SIMPLE). and I wanna know that.
I wrote:It will extract both 'test' and 'text' from "test text"
and only the first 'test' from "test test".
I've tried this regex but its not at all working

Code: Select all

(\w+)\s+([^\1])?

Posted: Sun May 20, 2007 6:06 am
by Ollie Saunders
The closest thing that would do it is a negative lookahead but that is non capturing and has strange effects on the first sub-pattern.

You can say ~(\w+)\s+(?!\1)~ but then for "foo foo" you'll get array(1 => 'oo'). I think what is happening here is that the 1st sub-pattern is being matched twice in order to fulfil the requirements of the lookahead, namely it attempts to make them different, and so it drops the first char and all of a sudden they are different and a match is found.

[^\1] is never going to work because [] is for matching sets and not whole strings. [abc] matches 'a' or 'b 'or 'c' not 'abc'. So unless someone can figure out some really hacky way of doing it I can't see how it is possible.
I belief there must be a way to do it with regex (AS THE LOGIC IS VERY SIMPLE). and I wanna know that.
yes it is very simple logic which is way then you use the logic functionality of PHP it is very easy to do. Regular expressions are not really a logical language. You can use a couple of prebaked conditions but that's all; mostly the kind of conditions that are difficult to perform using conventional logic. So combined with conventional logic regular expression are very powerful indeed. On their own, they are not.

Could you answer my questions now. Seeing as I have spent quite a lot of time on this I feel I am entitled to an answer. I think you will benefit from my response as well.
I wrote:So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?

Posted: Sun May 20, 2007 6:20 am
by nwp
You wrote:Could you answer my questions now. Seeing as I have spent quite a lot of time on this I feel I am entitled to an answer. I think you will benefit from my response as well.
I wrote:So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?
I want to learn how to do it with regex. and also if I can do it only with regex It will save lots of lines of my codes as onli 1 line of preg_match will work.
You wrote:[^\1] is never going to work because [] is for matching sets and not whole strings. [abc] matches 'a' or 'b 'or 'c' not 'abc'. So unless someone can figure out some really hacky way of doing it I can't see how it is possible.
I have tried using
I wrote:(\w+)\s+((?:^\1))?
But it doesn't works.
It(the ^) Asserts possition at the start of the string.
It(the ^) doesn't mean NOT here.
But it mean NOT in []
are there anything that mean NOT in () ??

Posted: Sun May 20, 2007 8:55 am
by Ollie Saunders
I want to learn how to do it with regex. and also if I can do it only with regex It will save lots of lines of my codes as onli 1 line of preg_match will work.
No offence but that's a really crap reason to try and do something that is really difficult. What exactly is wrong with lines of code?
are there anything that mean NOT in () ??
No, there isn't.

Posted: Sun May 20, 2007 9:02 am
by feyd
negative assertions are the closest you can get.

Posted: Sun May 20, 2007 12:10 pm
by nwp
Ya I've tried ?<= and ?<! but it didn't matched with any record