Find this, but not this...

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Find this, but not this...

Post by someberry »

Hi, having a bit of trouble finding some things in a given position, but not some things. For instance:

Code: Select all

<a href="http://www.domain.com">
Lets take my regex expression up to the first double quote until the ending double quote...

Code: Select all

(\"|\'|)([\w\W])*?\1
So this will match anything in between the two quotes. However, I want whatever was used to surround the URL to not be included in the URL... for obvious reasons.

Now, I can hear you all shouting that a URL can only contain certain characters, so why not just test for those. However, the scenario this is going to be used in is slightly different in that the URL could be replaced by anything, possibly even single/double quotes.

Sooo, the URL should only not contain what it is being surrounded by essentially. I am sure there is an easy way to do this but it has escaped me for the time being.

Thanks for your help,
someberry
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Post by someberry »

Anyone have any ideas?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

could you please make it clear: what you have as input and what you want as output? Several examples might be of use as well.
someberry
Forum Contributor
Posts: 172
Joined: Mon Apr 11, 2005 5:16 am

Post by someberry »

Code: Select all

1) "test text"
2) "test " text"
3) "test_^&*()'#text"
Number 1 and 3 are valid. Number 2 is not valid because it has a quote in it, and it is surrounded by quotes. Number 3 is correct as anything can be found between the surrounding quotes apart from the character that is surrounding them, in this case the quotation mark.

The surrounding characters can be a double quote ("), or a signle quote ('). So for instance:

Code: Select all

1) "test ' text"
2) 'test ' text'
3) 'test " text"
Number 1 is valid because the surrounders are double quotes and there is a single quote in it. Number 2 is invalid because the text is surrounding by single quotes but is another single quote in the string. Number 3 is valid because there is a double quote surounded by single quotes.

You getting me?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

The string part of it on a basic level looks like:

Code: Select all

$re = '/<a [^>]*?\bhref=((?:"[^"]*")|(?:\'[^\']*\'))[^>]*?>/';
$matches[1] will contain the url with the quotes on it, you just need to trim off the quotes.
Post Reply