Page 1 of 1

Find this, but not this...

Posted: Mon May 29, 2006 10:37 am
by someberry
Hi, having a bit of trouble finding some things in a given position, but not some things. For instance:

Code: Select all

<a href="http://www.domain.com">
Lets take my regex expression up to the first double quote until the ending double quote...

Code: Select all

(\"|\'|)([\w\W])*?\1
So this will match anything in between the two quotes. However, I want whatever was used to surround the URL to not be included in the URL... for obvious reasons.

Now, I can hear you all shouting that a URL can only contain certain characters, so why not just test for those. However, the scenario this is going to be used in is slightly different in that the URL could be replaced by anything, possibly even single/double quotes.

Sooo, the URL should only not contain what it is being surrounded by essentially. I am sure there is an easy way to do this but it has escaped me for the time being.

Thanks for your help,
someberry

Posted: Thu Jun 01, 2006 4:52 pm
by someberry
Anyone have any ideas?

Posted: Thu Jun 01, 2006 5:22 pm
by Weirdan
could you please make it clear: what you have as input and what you want as output? Several examples might be of use as well.

Posted: Fri Jun 02, 2006 6:41 am
by someberry

Code: Select all

1) "test text"
2) "test " text"
3) "test_^&*()'#text"
Number 1 and 3 are valid. Number 2 is not valid because it has a quote in it, and it is surrounded by quotes. Number 3 is correct as anything can be found between the surrounding quotes apart from the character that is surrounding them, in this case the quotation mark.

The surrounding characters can be a double quote ("), or a signle quote ('). So for instance:

Code: Select all

1) "test ' text"
2) 'test ' text'
3) 'test " text"
Number 1 is valid because the surrounders are double quotes and there is a single quote in it. Number 2 is invalid because the text is surrounding by single quotes but is another single quote in the string. Number 3 is valid because there is a double quote surounded by single quotes.

You getting me?

Posted: Fri Jun 02, 2006 7:35 am
by Chris Corbyn
The string part of it on a basic level looks like:

Code: Select all

$re = '/<a [^>]*?\bhref=((?:"[^"]*")|(?:\'[^\']*\'))[^>]*?>/';
$matches[1] will contain the url with the quotes on it, you just need to trim off the quotes.