Page 1 of 1

Capture non-whitespace fragment

Posted: Fri Dec 12, 2008 1:58 am
by Serpent_Guard
K, pretty simple stuff, but I have no idea how to do it.

I need to grab a string between two character patterns, and use the string in a preg_replace function, but only if the string doesn't have any whitespace characters in it.

I tried

Code: Select all

$text = preg_replace('/\[img\](.+?)|[\S]\[\/img\]/is','<img src'="$1">',$text);
, but it cuts off the last character in any string and returns it.

Re: Capture non-whitespace fragment

Posted: Fri Dec 12, 2008 3:23 am
by prometheuzz
A couple of remarks about your regex:

Code: Select all

'/\[img\](.+?)|[\S]\[\/img\]/'
really means: either "\[img\](.+?)" or "[\S]\[\/img\]". As you can see, the logical OR has a low precedense. So you'll want to group it like this:

Code: Select all

'/\[img\]((.+?)|[\S])\[\/img\]/'
But, now let's examine your approach:

Code: Select all

\[img\]      # matches '[img]'
(            #
  (.+?)      #   matches one or more characters of any type
  |          #   OR
  [\S]       #   matches one character of aby type except a white space character
)            #
\[\/img\]    # matches match '[/ img]' (without the space between '/' and 'i')
As you can see, that will also match strings containing white spaces inside img-tags because of the .+? part in your regex.

You'll probably want to do it like this:

Code: Select all

$text = preg_replace('@\[img\](\S+)\[/img\]@is', '<img src="$1">', $text);

Re: Capture non-whitespace fragment

Posted: Fri Dec 12, 2008 12:19 pm
by Serpent_Guard
Once again, your solution works perfectly. Thank you.

I tried googling to find out what '(.+?)' really does, and if I could find anything else to use that'd do the job, like '(\S+)', but I couldn't find anything. Do you have a link to such a site or document?

Re: Capture non-whitespace fragment

Posted: Fri Dec 12, 2008 12:49 pm
by prometheuzz
Serpent_Guard wrote:Once again, your solution works perfectly. Thank you.
You're welcome.
Serpent_Guard wrote:I tried googling to find out what '(.+?)' really does, and if I could find anything else to use that'd do the job, like '(\S+)', but I couldn't find anything. Do you have a link to such a site or document?
The '.' (DOT) will match any characters except new line characters (but in your case it also matches new line characters because you added the s-flag at the end).
A '+' means: "one or more". So, the regex:

Code: Select all

'/.+/'
matches one or more characters of any type except new line characters.
Now, when you add a question mark after it, the DOT-PLUS will be made reluctant opposed to being greedy. What this means can best be explained by an example:

Code: Select all

$text = 'aaaBaaaBaaaBaaa';
 
if(preg_match('/.+B/', $text, $match)) {
  echo "Greedy match    : $match[0]\n";
}
 
if(preg_match('/.+?B/', $text, $match)) {
  echo "Reluctant match : $match[0]\n";
}
when running that snippet, I'm sure the difference is clear (but, if not, just post back!).

And about \S. This is called a "negated shorthand character classes". What the $%#@ is that I hear you thinking... Okay, you probably know the character class [0-9] which matches one of the numerical digits 0, 1, 2, 3, ... , 9. Now, to since that character class is used so often, there's a shorthand for it, which is: \d. This \d is called a "shorthand character classes".
Now the opposite of \d is \D and matches any character except a numerical digit (\D is called the "negated shorthand character classes" of \d).
So, to wrap it up, \S is the opposite of \s which matches any white space character (new line, tab, space...). So \S will match any character except a white space character.

Okay class, that's it for today!

; )

If you're serious in learning some regex, the best on-line resource to learn them is this site: http://www.regular-expressions.info/

Good luck!

Re: Capture non-whitespace fragment

Posted: Fri Dec 12, 2008 12:56 pm
by Serpent_Guard
Thanks for the tips. I already knew what the character classes were all about, I just didn't know how to incorporate it into the statement properly.

Oh, and I was just curious if you or someone could help me understand this other regex statement:

Code: Select all

$text = preg_replace('/\[url\](.+?)\[\/url\]/is','<a href="$1">$1</a>',$text);
This'll take '[ url]www.example.com[/ url]' and transform it appropriately. But, it appears that it also properly transforms '[ url=www.example.com]text[/ url]'. I can't understand how this happens.

Here's the full source of the plugin in question. It's a bbPress plugin, and I've modified it slightly based on the response in this thread (the code in question is on line 49).