Page 1 of 2

Help with Regex or how to get this code to work

Posted: Fri Nov 16, 2007 10:16 pm
by irishmike2004
Hello All:

I am working on a site where I need to modify a JavaScript feed from a site. The site is del.icio.us which is a place to store bookmarks on the net instead of saving them into your browser. The site provides a link they call a "tagroll" to put your "tags" onto your website. This works except that it comes complete with CSS information and this does not work in a site that I am working on.

I have figured out the JavaScript that comes down from the script link they provide and want to work it into a PHP script to allow me to apply my own CSS to the information.

All that being said, The key is that the first line of the JS that they send is as follows:

Code: Select all

(function () {var ts = {"tagname":1 , "nexttagname":#...etc}
this is basically a list of the tags and I am looking how to get the items within quotes starting from the "ts ={" and ending at the last "}". I need to store these in an array, so a count would be kept on each match.

I was thinking of using a regular expression, but since I am not very experienced in using them, I am not sure how I would do it. There can be a lot of tags in there :-)

later on we will need to provide each tag a link using another part of javascript feed from them... the js looks like this:

Code: Select all

for (var i=0;i<3;i++) c[i]=s(ca[i],cz[i],ts[t]-ta,tz)
document.write('<li style="font-size:15px;line-height:1;"><a style="color:rgb('+c[0]+','+c[1]+','+c[2]+')" href="http://del.icio.us/nitromike/'+encodeURIComponent(t).replace('%2F','/')+'">'+t+'</a> </li>');
we of course would remove some of the for statement as the c ... etc deals with the style/color.

Any help would be appreciated on how to format the regular expression for this!

Thanks,

Mike

Posted: Fri Nov 16, 2007 11:30 pm
by rturner
I think I understood what you want...but not sure

Try

Code: Select all

$a = '(function () {var ts = {"tagname":1 , "nexttagname":#...etc}';

$needle = '@{var ts = {"(.*?)}@';
preg_match($needle, $a, $contents);
$contents = $contents[1];
echo $contents;
This will give you the contents between {var ts = {" until it runs into a }

Also, there are screen scrapers out there to help you out.
Screen Scrapers

good luck!

Posted: Sat Nov 17, 2007 6:31 am
by feyd
Curly braces are metacharacters in regular expressions.

Posted: Sat Nov 17, 2007 7:58 am
by irishmike2004
Thanks for the help guys. If we can not use curly braces in the regex, then what other method could I use to get the data?

I also forgot that I DO NOT want the :# either.

I was thinking of doing this:

Code: Select all

<?php
$userId = "del.icio.us user id";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

preg_match("regex here", $read) 

?>
Thanks,

Mike

Posted: Sat Nov 17, 2007 11:04 am
by rturner
Just escape the meta character with a backslash ... instead of { use \{
just call me the typo master!

Posted: Sat Nov 17, 2007 11:12 am
by John Cartwright
preg_quote() to escape regex meta characters

Posted: Sat Nov 17, 2007 12:47 pm
by irishmike2004
Hi All again:

The proposed match string does not yeild what I am looking for. It gets a number instead of the item in quotes. I may not have been clear in what I am trying to do so I will post the following items to help you guys see what I am trying to do with the PHP code. Again, the URL presented here gives me the end result, but with someone elses CSS attributes instead of our own... so I want to pull the tags which are what appear when you run the URL below:

<script type="text/javascript" src="[ur ... /nitromike></script>[/url]

now if you just type the http://del.icio.us/feeds/js/tags/nitromike into your browser, you get the JS back.

The first line looks like this:

Code: Select all

(function(){ var ts={"example":1}
It is this code that I want to pull the word "example" and ignore the :1. If I have multiple tags ("example" is my one tag) then I want to quit if there is no more tags when you reach the end "}".

The PHP I am testing right now is:

Code: Select all

<?php
$userId = "nitromike";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

$match = '@var ts=\{"*?\}@';

$tags = preg_match($match, $read);


echo 'url = '.$url.'<br />';
echo 'read = '.$read.'<br />';
echo 'tags = '.$tags;

?>
So my "$tags" variable ought to equal example with my user id.

if someone uses another user's id I have it will equal very many tags... 77 or so.

It would be nice to put each tag into an array element, but I will worry about that after I get the functionality of what I am doing here working :-)

Thanks for all the help!

Posted: Sat Nov 17, 2007 1:25 pm
by rturner

Code: Select all

<?php
$userId = "nitromike";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);
$match = '@var ts=\{"(.*?)":@';
preg_match($match, $read, $tags);
$tags = $tags[1];
echo 'url = '.$url.'<br />';
echo 'read = '.$read.'<br />';
echo 'tags = '.$tags;
?>
preg_match returns an array in $tags
placing $tags in front of it provides a return code for success or failure

Posted: Sat Nov 17, 2007 7:26 pm
by irishmike2004
Greetings All:

Well, I tried to take rturner's advice and have been working on the script. The array appears to be formed wrong thought print_r() shows most of the elements and ones called "array" throughout, but I can not get the tags back out of the array for printing. Below is the latest of my attempts in the script:

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
$match = '/(\"[a-z0-9]+\")/i';

//run the match and build the tags array
preg_match_all($match, $read, $tags);

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);

echo 'Tag Count is '.$tagCount.'<br />';

$keys = ARRAY_KEYS($tags);
//Process the tags for output
echo '<div class="delicious-tags" id="delicious-tags-'.$userId.'"><ul class="delicious-cloud">';

for($i=0;$i<=$tagCount;$i++){

	//echo '<li><a href="http://del.icio.us/'.$userId.'/'.$value.'>'.$value.'</a></li>';
	echo $tags[($keys[$i])];
}

echo '</ul></div>';

?>
the commented out link in the for loop is supposed to go through each tag and make a link for it. But of course if you echo the $tags array in any form it returns 2 elements "arrayarray" on output. I need to find a way to put the tags into a mechanism that I can print them out and the "array" entries should not be in there.

I am not sure why PHP arrays are so weird,but we may need to find another mechanism to do this... any suggestions??? The idea here is to be able to deal with each tag from the JS file and the match statement seems to yeild the proper amount of tags, but it also appears to be the source of the funky array tags that are added?

The $keys variable was one attempt to try to make this work,but it yeilds the same as $tags[$i].

Hope someone understands what I am trying to do now and can help a bit more. print_r() is not the way I want to get the elements printed out.

Thanks,

Mike

Posted: Sat Nov 17, 2007 10:44 pm
by rturner
Mike,

I'm not sure exactly what you are doing so this may not be what you are looking for ...but

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
$match = '/(\"[a-z0-9]+\")/i';

//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
Each tag is enclosed in quotes, is that your intention?

Posted: Sat Nov 17, 2007 10:55 pm
by irishmike2004
@rturner:

Yeah we want the tags without the quotes,but each tag is in quotes so the new regex made more sense to me.

Let me see if your variation works for me and I'll post back.

Thanks for the help.

Mike

Posted: Sat Nov 17, 2007 11:14 pm
by irishmike2004
@rturner:

Yep works just need to figure out how to get rid of the quotes now.

Thanks,

Mike

Posted: Sat Nov 17, 2007 11:15 pm
by rturner

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
#$match = '/(\"[a-z0-9]+\")/i';
$match = '/"(.*?)"/';
//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
To remove the quotes

Posted: Sat Nov 17, 2007 11:21 pm
by irishmike2004
hey rturner:

That gives us more "goo" than we want. The result on my 1 tag is:

$match = '/"(.*?)"/' gives us:

Code: Select all

Tag Count is 9
example
text/css

delicious-tags
delicious-tags-nitromike
delicious-cloud
font-size:15px;line-height:1;
color:rgb('+c[0]+','+c[1]+','+c[2]+')
http://del.icio.us/nitromike/'+encodeURIComponent(t).replace('%2F','/')+'
As opposed to the other regex ($match = '/(\"[a-z0-9]+\")/i') which yeilds this:

Code: Select all

Tag Count is 1
"example"
We get quotes but not the other "goo".

Posted: Sat Nov 17, 2007 11:35 pm
by rturner
Had to watch the end of the UFC fight :D

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
#$match = '/(\"[a-z0-9]+\")/i';
$match = '/"(.*?)":/';
//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
That might do it !