Page 1 of 1
Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 12:15 pm
by ct_lee
I am trying to get regex to match my string which is:
On some occasions there is content between the <a and the href part so i have used the regex:
.* <a .* href=\"
The code for my test program is in java as follows:
Code: Select all
public class test {
public static void main( String[] arguments) {
String test = " · <a href=\"http://www.link.ext/some-page/\" title=\"Title goes here\">";
if ( test.matches(".* <a .* href=\"") ) {
System.out.println("matches");
}
}
}
Can anyone point out where i am going wrong or provide a solution to matching that example link?
Thanks.
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 12:23 pm
by prometheuzz
It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after
href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex:
<a .* href (before and after the DOT-STAR) while there is only one space in your text.
Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:
but rather:
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 12:42 pm
by ct_lee
prometheuzz wrote:It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after
href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex:
<a .* href (before and after the DOT-STAR) while there is only one space in your text.
Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:
but rather:
1. I had tried something like that in earlyer examples but still didnt get any success.
2. Thats because some links i am going through start the tag with <a id="1234abc" href="...
3. I had read it was greedy with memory using .* but just for getting used to regex i would use something simple but thanks for the tip.
I tried to use:
When i tried to compile my program i got an error saying i had an illegal escape character which pointed to the \s part of the regex string.
Any ideas?
edit:
In java i think i have to use \\s instead of \s ?... I used the code below and it didnt match.
Code: Select all
public class test {
public static void main( String[] arguments) {
String test = " · <a href=\"http://www.link.ext/some-page/\" title=\"Title goes here\">";
if ( test.matches("<a \\s[^>]*href") ) {
System.out.println("matches");
}
}
}
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 12:46 pm
by prometheuzz
ct_lee wrote:prometheuzz wrote:It doesn't match for two reasons:
1 - String.matches() returns true if the the entire String is matched by the regex. Since your regex stops after
href=\", it won't match the entire String. Try adding another DOT-STAR at the end of your regex;
2 - there are two spaces in this part of your regex:
<a .* href (before and after the DOT-STAR) while there is only one space in your text.
Another thing, matching text with DOT-STAR should be avoided if you can. Be more specific where possible. So you shouldn't do:
but rather:
1. I had tried something like that in earlyer examples but still didnt get any success.
2. Thats because some links i am going through start the tag with <a id="1234abc" href="...
3. I had read it was greedy with memory using .* but just for getting used to regex i would use something simple but thanks for the tip.
I tried to use:
When i tried to compile my program i got an error saying i had an illegal escape character which pointed to the \s part of the regex string.
Any ideas?
Inside a String literal, you need to add an extra backslash, so it's not
\s but
\\s
Also note my remarks from point 1.
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 12:46 pm
by prometheuzz
Code: Select all
test.matches(".*<a\\s[^>]*href=\".*")
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 1:12 pm
by ct_lee
prometheuzz wrote:Code: Select all
test.matches(".*<a\\s[^>]*href=\".*")
I had just figured that i had missed that before i read your post, thank you very much for your help.
Re: Cant seem to get the right regex to match...
Posted: Thu Aug 13, 2009 1:13 pm
by prometheuzz
ct_lee wrote:prometheuzz wrote:Code: Select all
test.matches(".*<a\\s[^>]*href=\".*")
I had just figured that i had missed that before i read your post, thank you very much for your help.
No problem.