Page 1 of 2

condition?

Posted: Fri Oct 27, 2006 12:13 pm
by visonardo
i need take urls from href, but the problem is that in some it appears without quotes. the regex code would be thus (when use quotes)

$regexcode="<[\w]* (href)=(\"|')(.*)\\2[^>]*>";


the big problem is that if doesnt start with ' or ", the url cant contain white spaces, it would be something thus

$regexcode="<[\w]* (href)=([\S]*)[^>]*>";

i need some condition like

Code: Select all

<?

if(\\2!='')
(.*)
else
([\S]*)

?>

understand what i want?

it would be something thus

$regexcode="<[\w]* (href)=(\"|')?((if(\\2!='').else[\S])*)\\2?[^>]>";


:roll:

Posted: Fri Oct 27, 2006 12:27 pm
by timvw
Simplified: you want to match two situations: [ a | b ].... (Might want to read http://www.dotnetcoders.com/web/Learnin ... actor.aspx... )

Posted: Fri Oct 27, 2006 12:30 pm
by Chris Corbyn
Maybe (untested):

Code: Select all

$re = '/<\w+ href=("|\'|\b)(.*)\\1[^>]*>/is';

Posted: Fri Oct 27, 2006 7:53 pm
by printf
just a basic extractor...

Code: Select all

$str = 'some html page';

preg_match_all ( "|href\=\"?'?`?([[:alnum:]:?=&@/#._-]+)\"?'?`?|i", $str, $url );

print_r ( $url[1] );
printf

Posted: Fri Oct 27, 2006 7:55 pm
by visonardo
d11wtq wrote:Maybe (untested):

Code: Select all

$re = '/<\w+ href=("|\'|\b)(.*)\\1[^>]*>/is';

didnt work.

other thing, i have this example that i was testing by that told timvw

$re = "/<\w+ href=((("|')(.*)\\1)|(\S*))[^>]*>/i";


but in this case that i used to test that

Code: Select all

function ech($aver)
{
print_r($aver);
return $aver[0];
}
$re = "/<\w+ href=((("|')(.*)\\1)|(\S*))[^>]*>/i";

$a2="<link href=' hola che!' boder>";

$a2=preg_replace_callback($re,"ech",$a2);

OUTPUT

Code: Select all

<p>Array
(
    [0] => <link href=' hola che!' boder>
    [1] => '
    [2] => 
    [3] => 
    [4] => 
    [5] => '
)

Posted: Fri Oct 27, 2006 8:41 pm
by feyd

Code: Select all

<?php

$test = '<link test=`foo`href=\' hola che!\'more-test=\'ploop\' boder>';

preg_match('#<\s*[a-z:-]+\s+(?:\s*[a-z]+(?:\s*=\s*([\'"`]?).*?\\1)?)*\s*href\s*=\s*([\'"`]?)(.*?)\\2(?:\s*[a-z]+(?:\s*=\s*([\'"`]?).*?\\4)?)*[^>]*>#is', $test, $match);

var_dump($match);

?>

Code: Select all

array(4) {
  [0]=>
  string(57) "<link test=`foo`href=' hola che!'more-test='ploop' boder>"
  [1]=>
  string(1) "`"
  [2]=>
  string(1) "'"
  [3]=>
  string(10) " hola che!"
}
slight problem is (depending on how you look at it) it will only support zero or one leading and following attributes.

Posted: Sat Oct 28, 2006 8:28 am
by visonardo
thank feyd, but it really dont work when i find href without quotes, its like to say that are not white spaces in the url, your regex doesnt take that, but i used two regex to do that. But, i insist, must be a shape to do all in one :o

Posted: Sat Oct 28, 2006 8:37 am
by feyd
visonardo wrote:thank feyd, but it really dont work when i find href without quotes, its like to say that are not white spaces in the url, your regex doesnt take that, but i used two regex to do that. But, i insist, must be a shape to do all in one :o
I have almost no idea what you just said.

Posted: Sat Oct 28, 2006 8:51 am
by visonardo
this two codes in one would be.

Code: Select all

$regex1="/<[^>]+(href)\s*=(\S*)[^>]*>/is";
$regex2="/<[^>]+(href)\s*=\s*(\"|'|`)(.*)\\2[^>]*>/is";
individually work perfectly, but i would like to do all in one regex code. I tested doing thus

Code: Select all

$regex="/<[^>]+(href)\s*=(((\"|'|`)(.*)\\4)|(\S*))[^>]*>/is";
but didnt work :o


It must take the url in href thus

Code: Select all

<a href = "http://forums.devnetwork.net/">
and in

Code: Select all

<a href=http://devnetwork.net/ something=value>
as you saw, in the last href the url´s end is a white space or the >

Posted: Sat Oct 28, 2006 9:13 am
by feyd

Code: Select all

<?php

$tests = array(
	'<link test1=`foo1` test2=\'foo2\' test3="foo3" href=\' hola che!\'more-test=\'ploop\' boder>',
	'<a href=http://devnetwork.net/ something=value>',
);

foreach( $tests as $test )
{
	preg_match('#<\s*[a-z:-]+\s+.*?\s*href\s*=\s*(?:'.'([\'"`])(.*?)\\1|([^\s]+))[^>]*>#is', $test, $match);
	var_dump($match);
}

?>

Code: Select all

array(3) {
  [0]=>
  string(86) "<link test1=`foo1` test2='foo2' test3="foo3" href=' hola che!'more-test='ploop' boder>"
  [1]=>
  string(1) "'"
  [2]=>
  string(10) " hola che!"
}
array(4) {
  [0]=>
  string(47) "<a href=http://devnetwork.net/ something=value>"
  [1]=>
  string(0) ""
  [2]=>
  string(0) ""
  [3]=>
  string(22) "http://devnetwork.net/"
}

Posted: Sat Oct 28, 2006 9:16 am
by visonardo
thank again :) . But a detail, why you used [^\s] and not [\S] :roll: do you see some difference?

Posted: Sat Oct 28, 2006 9:19 am
by feyd
No particular reason, I just prefer to use the positive forms.

Posted: Sat Oct 28, 2006 9:20 am
by Chris Corbyn
visonardo wrote:thank again :) . But a detail, why you used [^\s] and not [\S] :roll: do you see some difference?
No difference; it's just sometimes what comes into your mind whilst you're building a pattern, it will work either way :)

By the way, I'm sure it's not intentional but using the eye-rolling emticon ( :roll: ) often looks like you're trying to be abrasive ;)

Posted: Sat Oct 28, 2006 9:37 am
by visonardo
d11wtq wrote:
visonardo wrote:thank again :) . But a detail, why you used [^\s] and not [\S] :roll: do you see some difference?
No difference; it's just sometimes what comes into your mind whilst you're building a pattern, it will work either way :)

By the way, I'm sure it's not intentional but using the eye-rolling emticon ( :roll: ) often looks like you're trying to be abrasive ;)
sorry but i wont use an emoticons of doubt. that you say was not my intention.

Other thing.

i have this two regex:

Code: Select all

$z1="#<[^>]+\s+.*?\s*href\s*=\s*((['"`])(.*?)\\2|([^\s]+))[^>]*>#is";
$z2='#<\s*[a-z:-]+\s+.*?\s*href\s*=\s*(?:'.'([\'"`])(.*?)\\1|([^\s]+))[^>]*>#is';
$z2 is feyd´s regex and $z1 is that i changed believing that was the same. if you see i just changed

Code: Select all

'\s*[a-z:-]+'
by

Code: Select all

'[^>]+'
and ordening to capture the url in a same order (\\3) i took out ?: of the parenteses, the end result is that the mine didnt work and regex´s feyd yes. why? if $z1 sould work

Code: Select all

$a2="     <link href=' hola che!' boder>   <a href=http://holaaaa something=value>";

$z1="#<[^>]+\s+.*?\s*href\s*=\s*((['"`])(.*?)\\2|([^\s]+))[^>]*>#is";
$z2='#<\s*[a-z:-]+\s+.*?\s*href\s*=\s*(?:'.'([\'"`])(.*?)\\1|([^\s]+))[^>]*>#is';

preg_match_all($z1,$a2,$match1);
print_r($match1);
echo '<p>';

preg_match_all($z2,$a2,$match2);
print_r($match2);
OUTPUT

Code: Select all

Array
(
    [0] => Array
        (
            [0] => <link href=' hola che!' boder>   <a href=http://holaaaa something=value>
        )

    [1] => Array
        (
            [0] => http://holaaaa
        )

    [2] => Array
        (
            [0] => 
        )

    [3] => Array
        (
            [0] => 
        )

    [4] => Array
        (
            [0] => http://holaaaa
        )

)







Array
(
    [0] => Array
        (
            [0] => <link href=' hola che!' boder>
            [1] => <a href=http://holaaaa something=value>
        )

    [1] => Array
        (
            [0] => '
            [1] => 
        )

    [2] => Array
        (
            [0] =>  hola che!
            [1] => 
        )

    [3] => Array
        (
            [0] => 
            [1] => http://holaaaa
        )

)

Posted: Sat Oct 28, 2006 9:57 am
by feyd
Adding parentheses around each in the original regex pattern will illustrate the differences.