Cant seem to get the right regex to match...

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
ct_lee
Forum Newbie
Posts: 12
Joined: Wed Aug 05, 2009 3:15 pm

Cant seem to get the right regex to match...

Post by ct_lee »

I am trying to get regex to match my string which is:
&nbsp;&middot;&nbsp; <a href=\"http://www.link.ext/some-page/\" title=\"Title goes here\">
On some occasions there is content between the <a and the href part so i have used the regex:
.* <a .* href=\"
The code for my test program is in java as follows:

Code: Select all

public class test {
 
    public static void main( String[] arguments) {
        String test = "                                 &nbsp;&middot;&nbsp;                                                                    <a href=\"http://www.link.ext/some-page/\" title=\"Title goes here\">";
        if ( test.matches(".* <a .* href=\"") ) {
            System.out.println("matches");
        }
    }
 
}
Can anyone point out where i am going wrong or provide a solution to matching that example link?

Thanks.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Cant seem to get the right regex to match...

Post by prometheuzz »

It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex: <a .* href (before and after the DOT-STAR) while there is only one space in your text.

Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:

Code: Select all

"<a .*href"
but rather:

Code: Select all

"<a\s[^>]*href"
ct_lee
Forum Newbie
Posts: 12
Joined: Wed Aug 05, 2009 3:15 pm

Re: Cant seem to get the right regex to match...

Post by ct_lee »

prometheuzz wrote:It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex: <a .* href (before and after the DOT-STAR) while there is only one space in your text.

Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:

Code: Select all

"<a .*href"
but rather:

Code: Select all

"<a\s[^>]*href"
1. I had tried something like that in earlyer examples but still didnt get any success.
2. Thats because some links i am going through start the tag with <a id="1234abc" href="...
3. I had read it was greedy with memory using .* but just for getting used to regex i would use something simple but thanks for the tip.

I tried to use:

Code: Select all

"<a\s[^>]*href"
When i tried to compile my program i got an error saying i had an illegal escape character which pointed to the \s part of the regex string.

Any ideas?

edit:
In java i think i have to use \\s instead of \s ?... I used the code below and it didnt match.

Code: Select all

public class test {
 
    public static void main( String[] arguments) {
        String test = "                                 &nbsp;&middot;&nbsp;                                                                    <a href=\"http://www.link.ext/some-page/\" title=\"Title goes here\">";
        if ( test.matches("<a \\s[^>]*href") ) {
            System.out.println("matches");
        }
    }
 
}
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Cant seem to get the right regex to match...

Post by prometheuzz »

ct_lee wrote:
prometheuzz wrote:It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex: <a .* href (before and after the DOT-STAR) while there is only one space in your text.

Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:

Code: Select all

"<a .*href"
but rather:

Code: Select all

"<a\s[^>]*href"
1. I had tried something like that in earlyer examples but still didnt get any success.
2. Thats because some links i am going through start the tag with <a id="1234abc" href="...
3. I had read it was greedy with memory using .* but just for getting used to regex i would use something simple but thanks for the tip.

I tried to use:

Code: Select all

"<a\s[^>]*href"
When i tried to compile my program i got an error saying i had an illegal escape character which pointed to the \s part of the regex string.

Any ideas?
Inside a String literal, you need to add an extra backslash, so it's not \s but \\s
Also note my remarks from point 1.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Cant seem to get the right regex to match...

Post by prometheuzz »

Code: Select all

test.matches(".*<a\\s[^>]*href=\".*")
ct_lee
Forum Newbie
Posts: 12
Joined: Wed Aug 05, 2009 3:15 pm

Re: Cant seem to get the right regex to match...

Post by ct_lee »

prometheuzz wrote:

Code: Select all

test.matches(".*<a\\s[^>]*href=\".*")
I had just figured that i had missed that before i read your post, thank you very much for your help.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Cant seem to get the right regex to match...

Post by prometheuzz »

ct_lee wrote:
prometheuzz wrote:

Code: Select all

test.matches(".*<a\\s[^>]*href=\".*")
I had just figured that i had missed that before i read your post, thank you very much for your help.
No problem.
Post Reply