NOT Group Extraction

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

NOT Group Extraction

Post by nwp »

Hi!
I wanna extract these two words from the following string

Code: Select all

String anotherstring
provided String and anotherstring is NOT equal.
I've tried this regex

Code: Select all

(.*?) ([^\1]+?)
But It doesn't work.
e.g. from hello hello extract only the first hello but from hello hi extract both hello and hi.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Can you explain more?
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

Are you trying to extract unique words from a string? Can you give a full example of the input you expect to receive? If the words are always separated in the same way, and there isn't any punctuation to worry about, you can just use explode() to split the words into an array, and run it through array_unique().
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

What I want is
Text#1 : string string
Text#2 : hello hello
here regex will extract both of the hello and both of the string
text#3 : hello everybody
text#4 : string differentstring
here regex will extract Only hello not the everybody and it will extract the string not differentstring.
And to do that I've tried the following

Code: Select all

(.*?) ([\1]+?)
Here (.*?) will extract the First string
and [\1] is to check wheather its the same as the first extracted string.
But Its not working
And I also need the regex just to do the opposite of what I've mentioned above
And I was trying to use

Code: Select all

(.*?) ([^\1]+?)
But its also not working.
------------------
EDIT
And I want to do it using regex
I know how to do it without regex.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Code: Select all

<?php
function conditionalExplode($input)
{
    $explo = explode(' ', $input);
    if ($explo[0] != $explo[1]) {
        unset($explo[1]);
    }
    return $explo;
}
function conditionalMatch($input)
{
    $matches = array();
    preg_match('~(\w+)\s+(\1)?~', $input, $matches);
    unset($matches[0]);
    return array_values($matches);
}

assert_options(ASSERT_ACTIVE, true);
assert_options(ASSERT_WARNING, true);

assert(($out = conditionalMatch('string string')) == array('string', 'string')); var_dump($out);
assert(($out = conditionalMatch('hello hello')) == array('hello', 'hello')); var_dump($out);

assert(($out = conditionalMatch('hello everybody')) == array('hello')); var_dump($out);
assert(($out = conditionalMatch('string differentstring')) == array('string')); var_dump($out);
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

I've told before that I want to do it just using Regex
nwp wrote:EDIT
And I want to do it using regex
I know how to do it without regex.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I wrote:~(\w+)\s+(\1)?~
Hmmm looks suspiciously like a regex
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

Thanks. that regex worked
It extracts both 'test'(s) from "test test"
But extracts only the first 'test' from "test text".
And now I want the regex that can do just the opposite of this regex.
e.g. It will extract both 'test' and 'text' from "test text"
and only the first 'test' from "test test".
I've tried regex but its not at all working.

Code: Select all

(\w+)\s+([^\1])?
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I've given this some thought and it can't be done without PHP code or the e modifier.
So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

I dont need preg_replace() I just need to extract the strings with preg_match(). so here is no work of e modifier.
I belief there must be a way to do it with regex (AS THE LOGIC IS VERY SIMPLE). and I wanna know that.
I wrote:It will extract both 'test' and 'text' from "test text"
and only the first 'test' from "test test".
I've tried this regex but its not at all working

Code: Select all

(\w+)\s+([^\1])?
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

The closest thing that would do it is a negative lookahead but that is non capturing and has strange effects on the first sub-pattern.

You can say ~(\w+)\s+(?!\1)~ but then for "foo foo" you'll get array(1 => 'oo'). I think what is happening here is that the 1st sub-pattern is being matched twice in order to fulfil the requirements of the lookahead, namely it attempts to make them different, and so it drops the first char and all of a sudden they are different and a match is found.

[^\1] is never going to work because [] is for matching sets and not whole strings. [abc] matches 'a' or 'b 'or 'c' not 'abc'. So unless someone can figure out some really hacky way of doing it I can't see how it is possible.
I belief there must be a way to do it with regex (AS THE LOGIC IS VERY SIMPLE). and I wanna know that.
yes it is very simple logic which is way then you use the logic functionality of PHP it is very easy to do. Regular expressions are not really a logical language. You can use a couple of prebaked conditions but that's all; mostly the kind of conditions that are difficult to perform using conventional logic. So combined with conventional logic regular expression are very powerful indeed. On their own, they are not.

Could you answer my questions now. Seeing as I have spent quite a lot of time on this I feel I am entitled to an answer. I think you will benefit from my response as well.
I wrote:So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

You wrote:Could you answer my questions now. Seeing as I have spent quite a lot of time on this I feel I am entitled to an answer. I think you will benefit from my response as well.
I wrote:So why does this have to be regex only? and why do you need to match in such a precise and bizarre way?
I want to learn how to do it with regex. and also if I can do it only with regex It will save lots of lines of my codes as onli 1 line of preg_match will work.
You wrote:[^\1] is never going to work because [] is for matching sets and not whole strings. [abc] matches 'a' or 'b 'or 'c' not 'abc'. So unless someone can figure out some really hacky way of doing it I can't see how it is possible.
I have tried using
I wrote:(\w+)\s+((?:^\1))?
But it doesn't works.
It(the ^) Asserts possition at the start of the string.
It(the ^) doesn't mean NOT here.
But it mean NOT in []
are there anything that mean NOT in () ??
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I want to learn how to do it with regex. and also if I can do it only with regex It will save lots of lines of my codes as onli 1 line of preg_match will work.
No offence but that's a really crap reason to try and do something that is really difficult. What exactly is wrong with lines of code?
are there anything that mean NOT in () ??
No, there isn't.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

negative assertions are the closest you can get.
nwp
Forum Contributor
Posts: 105
Joined: Sun Feb 04, 2007 12:25 pm

Post by nwp »

Ya I've tried ?<= and ?<! but it didn't matched with any record
Post Reply