Page 8 of 13

Posted: Sat Jun 02, 2007 10:29 am
by ziggy3000
so i have to have another preg_replace underneath that one, so it scans for < and > and removes anything between those?

edit:

it's not working :madblow:
i put

Code: Select all

[^<span class='\w'>]
and

Code: Select all

[^</span>]
after and before (.*?) in my code, but it's not working

Posted: Sat Jun 02, 2007 11:34 am
by superdezign
You're using character classes.

[...] is a character class.
(...) is a sub pattern.

And BTW, putting '^' at the beginning of a character class negates the characters class.

I'd suggest getting Regex Coach. Awesome program!

Posted: Sat Jun 02, 2007 11:49 am
by ziggy3000
yea, i'm tring not to get <span>...

i am doing something wrong i think.

can you tell me how i should exclude <span> from my regex?

Posted: Sat Jun 02, 2007 12:22 pm
by superdezign
You mean not include it in your comments? I think what you want to do is remove them from your comments after the comments have been highlighted, or you'll have highlighted text inside of highlighted text (which takes precedence in HTML).

Posted: Sat Jun 02, 2007 12:23 pm
by ziggy3000
yes. that's want i have been trying to do...

but the problem is, exactly how would i fix it?

my html code:

Code: Select all

<code class="comment">/*
<br><span class="keyword">SELECT</span> * <span class="keyword">FROM</span> <span class="quote"><span class="quote">`table`</span></span>
<br>*/
<br><span class="comment">//comment
<br>#jnf
<br></span><br></code>

Posted: Sat Jun 02, 2007 12:28 pm
by superdezign
Check the regex that determines if something is in a comment, combined with regex if something is an HTML tag. However, you only want to select the HTML tags for replacement. Here's the regex for an html tag (I think).

Code: Select all

#(</?[^>]+>)#

Posted: Sat Jun 02, 2007 12:32 pm
by ziggy3000
so would this be it?

Code: Select all

$sql = preg_replace("'(\/\*)(</?[^>]>)(.*)(</?[^>]>)(\*/)'", "<span class='comment'>\\1\\3\\5</span>", $sql);
edit:
it doesn't work...

Posted: Sat Jun 02, 2007 12:41 pm
by superdezign
No, not at all.

Want you want is to check if it's an HTML tag, and check if it's in comments, then delete it. You don't need to check both tags. I'm pretty sure the pattern I cave you works for either.

You don't want to get rid of what's in the tags, just the individual tags themselves.

And take noticed that I edited my early regex to include + on the [^>] character class.

Posted: Sat Jun 02, 2007 12:48 pm
by superdezign
You should really get Regex Coach.

Code: Select all

#(/\*[^<]*)(</?[^>]+>)(.*\*/)#
And keep $1 and $3. I think. This is the best I could come up with.

Posted: Sat Jun 02, 2007 4:28 pm
by ziggy3000
even if you include $2, it doesn't change anything. actually $2 helps the parsing... because without it, well, $2 is like a newline(\n)

Posted: Sat Jun 02, 2007 4:30 pm
by superdezign
What? $2 is the subpattern for the HTML tag. That's what you want to get rid of, correct?

Posted: Sat Jun 02, 2007 4:33 pm
by ziggy3000
yea but without it, here is the output

Code: Select all

/* SELECT * FROM `table`
*/
and with it

Code: Select all

/* 
SELECT * FROM `table`
*/
and this is what i am trying to parse
/*
SELECT * FROM `table`
*/
HTML OUTPUT

Code: Select all

<span class="comment">/*
<br><span class="keyword">SELECT</span> * <span class="keyword">FROM</span> <span class="quote"><span class="quote">`table`</span></span>
<br>*/</span>
and i downloaded regex coach, but i don't get why it's really good... i got confused on how to use it.

Posted: Sat Jun 02, 2007 6:39 pm
by ziggy3000
so far, my pattern would be perfect if it didn't include any HTML :madblow: at least it highlight multi line comments

my pattern

Code: Select all

$sql = preg_replace("#(\/\*[^<>]*)([^<>]\w{0,400})(.*\*/)#i", "<span class='comment'>\\1\\2\\3</span>", $sql);
edit: can't i have 2 seperate replaces that start and end html comments <!-- and -->? when i am trying it, the <!-- is ended by <br />

Posted: Sat Jun 02, 2007 7:15 pm
by superdezign
Oh, what's going on is that the regex I gave you replaces the first occurrence of any HTML tag inside of the comments. That includes the <br /> tag. You'd have to ignore the <br />. As for removing more than one tag... I don't know. preg_match_all(), maybe?

Posted: Sat Jun 02, 2007 7:55 pm
by ziggy3000
i found something wonderful :D
Example 1674. Find matching HTML tags (greedy)
<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
echo "matched: " . $val[0] . "\n";
echo "part 1: " . $val[1] . "\n";
echo "part 2: " . $val[3] . "\n";
echo "part 3: " . $val[4] . "\n\n";
}
?>
from php.net

$val[3] echos the thing between html tags. if only i could combine this with my regex...