Page 1 of 1

Problem with searching for strings with identical parts

Posted: Sun Dec 25, 2011 10:33 am
by jraede
I'm working on my syntax highlighter which some of you may have helped me debug a few months ago. Getting around to highlighting CSS, and I've run into a snag in the regex that finds all "value" keywords (I guess that's what you'd call them; they're the ones on the right side of the colon when you're defining a property.) Examples of these are "purple", "inline-block", "monospace", etc.

Anyway, I wrote up this regex pattern for recognizing if these strings fall after a colon and before a semi colon (along with sans-serif and serif, I have all other options for the value keywords, but that would make it unreadable).

[syntax]*(:)([^;]+)?(sans-serif|serif)([^A-Za-z0-9])?*[/syntax]

Anyway, I've hit a snag when trying to get it to match and return both "sans-serif" and "serif." Even when it finds "sans-serif", it matches "sans-" in the pattern ([^;]+)?, and then "serif" matches the actual keyword search.

Any suggestions on how to fix this while also keeping the conditional that the keyword is between a colon and a semicolon, with any sort of property, digit, color, etc values in between?

I.e, it's supposed to match something like
[syntax]font-family:"Helvetica", "Arial", sans-serif;[/syntax]

Thanks!

Re: Problem with searching for strings with identical parts

Posted: Sun Dec 25, 2011 4:29 pm
by ragax
Hi jraede,

I don't understand the full details of what you're trying to do.
But adding a question mark at the end of your first group:

Code: Select all

([^;]+?)
certainly ensures that "sans-serif" gets captured in your test string.
The other capture is the beginning of the font string:
"Helvetica", "Arial",
Is this what you are looking for?
If not, can you please post at least two test strings, and the exact desired matches and group captures?

Wishing you a beautiful day

Re: Problem with searching for strings with identical parts

Posted: Wed Feb 22, 2012 7:29 am
by abareplace
Checking the comma before and after "sans-serif" could help you:

Code: Select all

[:,]\s*(sans-serif|serif)\s*[,;]