Regex matching non-alphanumeric

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
TomasTrek
Forum Newbie
Posts: 6
Joined: Tue Aug 19, 2008 8:44 am

Regex matching non-alphanumeric

Post by TomasTrek »

I am trying to get my latest twitter post to appear on my own site. I can retrieve the post and show it but I would like to link @replies to the profile of the user to whom they are directed. I am using the following just to test:

Code: Select all

//Link @replies to their profile
preg_match('/@[a-z0-9]+/',$tweet,$matches);
$repl = substr($matches[0],1);
        
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
The post I made to test was "Sorry @person1 @person2, just testing something".

Now the above code links @person1 to http://www.twitter.com/person1, but when I check the length of $matches it is 1. So it hasnt discovered the @person2. The reason for this has to be the comma, as it is the only difference in the two. So how can I alter my regex so that it will match an @ followed by any combination of numbers and letters until it reaches something that is neither a number or letter. I cant just do it for spaces or commas as I may put a - or a : at some point in the future.
Last edited by Benjamin on Tue May 26, 2009 10:27 am, edited 1 time in total.
Reason: Changed code type from text to php.
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: Regex matching non-alphanumeric

Post by jayshields »

Your regex is fine. Are you sure $matches has only one element?
TomasTrek
Forum Newbie
Posts: 6
Joined: Tue Aug 19, 2008 8:44 am

Re: Regex matching non-alphanumeric

Post by TomasTrek »

I checked with count on my $matches array, which gave me a result of 1. My entire code is:

Code: Select all

    $co = curl_init('http://twitter.com/statuses/user_timeline/my_username.xml');
    
    curl_setopt($co, CURLOPT_VERBOSE, 1);
    curl_setopt($co, CURLOPT_HEADER, 0);
    curl_setopt($co, CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($co, CURLOPT_RETURNTRANSFER, 1);
    
    $cr = curl_exec($co);
    $ci = curl_getinfo($co);
    
    curl_close($co);
    
    if($ci['http_code']==200)
    {
        //Get latest tweet
        preg_match('/<text>(.*?)<\/text>/',$cr,$matches);
        $tweet = $matches[1];
 
        //Break at 105 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
        
        //Link @replies to their profile
        preg_match('/@[a-z0-9]+/',$tweet,$matches);
        echo count($matches);
        $repl = substr($matches[0],1);
        
        $tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
    }
The echo count($matches) prints out "1". Printing out the 0th element of that array gives "@person1". Echoing the contents of $tweet I get
"Sorry @<a href="http://www.twitter.com/person1" target="_blank">person1</a> @person2, just testing out auto-linking at-replies for my site."
edit:
I tried it on the string "test @person1 @person2 blah" and it still only worked on the same result. I think the regex is just thinking it has to stop on the first match. I know from perl you put a /g at the end to make it continue but that just gives me a warning when I try it in php:
Warning: preg_match() [function.preg-match]: Unknown modifier 'g' in /opt/lampp/htdocs/my_site_addr/index.php on line 26
Last edited by Benjamin on Tue May 26, 2009 10:28 am, edited 1 time in total.
Reason: Changed code type from text to php.
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: Regex matching non-alphanumeric

Post by jayshields »

It's because you're only doing the preg_replace() once.

Put this after your existing preg_replace():

Code: Select all

$repl = substr($matches[1],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
The count of $matches can't be 1. I've tested your regex in here http://www.cuneytyilmaz.com/prog/jrx/
TomasTrek
Forum Newbie
Posts: 6
Joined: Tue Aug 19, 2008 8:44 am

Re: Regex matching non-alphanumeric

Post by TomasTrek »

I tried adding your code:

Code: Select all

//Link @replies to their profile
preg_match('/@[a-z0-9]+/',$tweet,$matches);
$repl = substr($matches[0],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
        
$repl = substr($matches[1],1);
$tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
I ended up with this:
<a href="http://www.twitter.com/" target="_blank"></a>t<a href="http://www.twitter.com/" target="_blank"></a>e<a href="http://www.twitter.com/" target="_blank"></a>s<a href="http://www.twitter.com/" target="_blank"></a>t<a href="http://www.twitter.com/" target="_blank"></a> <a href="http://www.twitter.com/" target="_blank"></a>@<a href="http://www.twitter.com/" target="_blank"></a><<a href="http://www.twitter.com/" target="_blank"></a>a<a href="http://www.twitter.com/" target="_blank"></a> <a href="http://www.twitter.com/" target="_blank"></a>h<a href="http://www.twitter.com/" target="_blank"></a>r<a href="http://www.twitter.com/" target="_blank"></a>e<a href="http://www.twitter.com/" target="_blank"></a>f<a href="http://www.twitter.com/" target="_blank"></a>=<a href="http://www.twitter.com/" target="_blank">
and on through every character of the string. I would expect this to happen when trying to replace by a blank string - the replacement is put between every character. The string I used to test this was "test @person1 @person2 blah".

On the page all you see is the tweet and the html of what should be the link. If you look at the source you can see the above.

Edit:
Okay I seem to have solved it now, by using preg_match_all rather than preg_match. Thank you for all the help you gave. The working code is:

Code: Select all

<?PHP
function GetLastTweet($user)
{
    $tweet = 'Error obtaining tweet';
    $co = curl_init('http://twitter.com/statuses/user_timeline/'.$user.'.xml');
 
    curl_setopt($co, CURLOPT_VERBOSE, 1);
    curl_setopt($co, CURLOPT_HEADER, 0);
    curl_setopt($co, CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($co, CURLOPT_RETURNTRANSFER, 1);
    
    $cr = curl_exec($co);
    $ci = curl_getinfo($co);
    
    curl_close($co);
    
    if($ci['http_code']==200)
    {
        //Get latest tweet
        preg_match('/<text>(.*?)<\/text>/',$cr,$matches);
        $tweet = $matches[1];
 
        //Break at 84 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
        
        //Link @replies to their profile
        preg_match_all('/@[a-z0-9]+/',$tweet,$matches);
        foreach($matches as $match)
        {
            foreach($match as $m)
            {
                $repl = substr($m,1);
                $tweet = preg_replace('/'.$repl.'/i','<a href="http://www.twitter.com/'.$repl.'" target="_blank">'.$repl.'</a>',$tweet);
            }
        }
    }
    else
    {
        //Break at 84 characters into a two lines
        $tweet = wordwrap($tweet,84,'<br/>',false);
    }
    return $tweet;
}
?>
Last edited by Benjamin on Tue May 26, 2009 10:28 am, edited 1 time in total.
Reason: Changed code type from text to php.
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: Regex matching non-alphanumeric

Post by jayshields »

Sorry, I should've spotted that... I'm not used to using preg_match instead of preg_match_all and for some reason presumed you had used it.

Glad you got it working. :)
Post Reply