Page 1 of 1

Regex matching non-alphanumeric

Posted: Sun May 24, 2009 2:37 pm
by TomasTrek
I am trying to get my latest twitter post to appear on my own site. I can retrieve the post and show it but I would like to link @replies to the profile of the user to whom they are directed. I am using the following just to test:

Code: Select all

//Link @replies to their profile
preg_match('/@[a-z0-9]+/',$tweet,$matches);
$repl = substr($matches[0],1);
        
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
The post I made to test was "Sorry @person1 @person2, just testing something".

Now the above code links @person1 to http://www.twitter.com/person1, but when I check the length of $matches it is 1. So it hasnt discovered the @person2. The reason for this has to be the comma, as it is the only difference in the two. So how can I alter my regex so that it will match an @ followed by any combination of numbers and letters until it reaches something that is neither a number or letter. I cant just do it for spaces or commas as I may put a - or a : at some point in the future.

Re: Regex matching non-alphanumeric

Posted: Sun May 24, 2009 2:54 pm
by jayshields
Your regex is fine. Are you sure $matches has only one element?

Re: Regex matching non-alphanumeric

Posted: Sun May 24, 2009 3:05 pm
by TomasTrek
I checked with count on my $matches array, which gave me a result of 1. My entire code is:

Code: Select all

    $co = curl_init('http://twitter.com/statuses/user_timeline/my_username.xml');
    
    curl_setopt($co, CURLOPT_VERBOSE, 1);
    curl_setopt($co, CURLOPT_HEADER, 0);
    curl_setopt($co, CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($co, CURLOPT_RETURNTRANSFER, 1);
    
    $cr = curl_exec($co);
    $ci = curl_getinfo($co);
    
    curl_close($co);
    
    if($ci['http_code']==200)
    {
        //Get latest tweet
        preg_match('/<text>(.*?)<\/text>/',$cr,$matches);
        $tweet = $matches[1];
 
        //Break at 105 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
        
        //Link @replies to their profile
        preg_match('/@[a-z0-9]+/',$tweet,$matches);
        echo count($matches);
        $repl = substr($matches[0],1);
        
        $tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
    }
The echo count($matches) prints out "1". Printing out the 0th element of that array gives "@person1". Echoing the contents of $tweet I get
"Sorry @<a href="http://www.twitter.com/person1" target="_blank">person1</a> @person2, just testing out auto-linking at-replies for my site."
edit:
I tried it on the string "test @person1 @person2 blah" and it still only worked on the same result. I think the regex is just thinking it has to stop on the first match. I know from perl you put a /g at the end to make it continue but that just gives me a warning when I try it in php:
Warning: preg_match() [function.preg-match]: Unknown modifier 'g' in /opt/lampp/htdocs/my_site_addr/index.php on line 26

Re: Regex matching non-alphanumeric

Posted: Sun May 24, 2009 3:24 pm
by jayshields
It's because you're only doing the preg_replace() once.

Put this after your existing preg_replace():

Code: Select all

$repl = substr($matches[1],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
The count of $matches can't be 1. I've tested your regex in here http://www.cuneytyilmaz.com/prog/jrx/

Re: Regex matching non-alphanumeric

Posted: Sun May 24, 2009 3:42 pm
by TomasTrek
I tried adding your code:

Code: Select all

//Link @replies to their profile
preg_match('/@[a-z0-9]+/',$tweet,$matches);
$repl = substr($matches[0],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
        
$repl = substr($matches[1],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
I ended up with this:
<a href="http://www.twitter.com/" target="_blank"></a>t<a href="http://www.twitter.com/" target="_blank"></a>e<a href="http://www.twitter.com/" target="_blank"></a>s<a href="http://www.twitter.com/" target="_blank"></a>t<a href="http://www.twitter.com/" target="_blank"></a> <a href="http://www.twitter.com/" target="_blank"></a>@<a href="http://www.twitter.com/" target="_blank"></a><<a href="http://www.twitter.com/" target="_blank"></a>a<a href="http://www.twitter.com/" target="_blank"></a> <a href="http://www.twitter.com/" target="_blank"></a>h<a href="http://www.twitter.com/" target="_blank"></a>r<a href="http://www.twitter.com/" target="_blank"></a>e<a href="http://www.twitter.com/" target="_blank"></a>f<a href="http://www.twitter.com/" target="_blank"></a>=<a href="http://www.twitter.com/" target="_blank">
and on through every character of the string. I would expect this to happen when trying to replace by a blank string - the replacement is put between every character. The string I used to test this was "test @person1 @person2 blah".

On the page all you see is the tweet and the html of what should be the link. If you look at the source you can see the above.

Edit:
Okay I seem to have solved it now, by using preg_match_all rather than preg_match. Thank you for all the help you gave. The working code is:

Code: Select all

<?PHP
function GetLastTweet($user)
{
    $tweet = 'Error obtaining tweet';
    $co = curl_init('http://twitter.com/statuses/user_timeline/'.$user.'.xml');
 
    curl_setopt($co, CURLOPT_VERBOSE, 1);
    curl_setopt($co, CURLOPT_HEADER, 0);
    curl_setopt($co, CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($co, CURLOPT_RETURNTRANSFER, 1);
    
    $cr = curl_exec($co);
    $ci = curl_getinfo($co);
    
    curl_close($co);
    
    if($ci['http_code']==200)
    {
        //Get latest tweet
        preg_match('/<text>(.*?)<\/text>/',$cr,$matches);
        $tweet = $matches[1];
 
        //Break at 84 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
        
        //Link @replies to their profile
        preg_match_all('/@[a-z0-9]+/',$tweet,$matches);
        foreach($matches as $match)
        {
            foreach($match as $m)
            {
                $repl = substr($m,1);
                $tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
            }
        }
    }
    else
    {
        //Break at 84 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
    }
    return $tweet;
}
?>

Re: Regex matching non-alphanumeric

Posted: Sun May 24, 2009 6:20 pm
by jayshields
Sorry, I should've spotted that... I'm not used to using preg_match instead of preg_match_all and for some reason presumed you had used it.

Glad you got it working. :)