Help with Regex or how to get this code to work

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Help with Regex or how to get this code to work

Post by irishmike2004 »

Hello All:

I am working on a site where I need to modify a JavaScript feed from a site. The site is del.icio.us which is a place to store bookmarks on the net instead of saving them into your browser. The site provides a link they call a "tagroll" to put your "tags" onto your website. This works except that it comes complete with CSS information and this does not work in a site that I am working on.

I have figured out the JavaScript that comes down from the script link they provide and want to work it into a PHP script to allow me to apply my own CSS to the information.

All that being said, The key is that the first line of the JS that they send is as follows:

Code: Select all

(function () {var ts = {"tagname":1 , "nexttagname":#...etc}
this is basically a list of the tags and I am looking how to get the items within quotes starting from the "ts ={" and ending at the last "}". I need to store these in an array, so a count would be kept on each match.

I was thinking of using a regular expression, but since I am not very experienced in using them, I am not sure how I would do it. There can be a lot of tags in there :-)

later on we will need to provide each tag a link using another part of javascript feed from them... the js looks like this:

Code: Select all

for (var i=0;i<3;i++) c[i]=s(ca[i],cz[i],ts[t]-ta,tz)
document.write('<li style="font-size:15px;line-height:1;"><a style="color:rgb('+c[0]+','+c[1]+','+c[2]+')" href="http://del.icio.us/nitromike/'+encodeURIComponent(t).replace('%2F','/')+'">'+t+'</a> </li>');
we of course would remove some of the for statement as the c ... etc deals with the style/color.

Any help would be appreciated on how to format the regular expression for this!

Thanks,

Mike
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

I think I understood what you want...but not sure

Try

Code: Select all

$a = '(function () {var ts = {"tagname":1 , "nexttagname":#...etc}';

$needle = '@{var ts = {"(.*?)}@';
preg_match($needle, $a, $contents);
$contents = $contents[1];
echo $contents;
This will give you the contents between {var ts = {" until it runs into a }

Also, there are screen scrapers out there to help you out.
Screen Scrapers

good luck!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Curly braces are metacharacters in regular expressions.
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

Thanks for the help guys. If we can not use curly braces in the regex, then what other method could I use to get the data?

I also forgot that I DO NOT want the :# either.

I was thinking of doing this:

Code: Select all

<?php
$userId = "del.icio.us user id";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

preg_match("regex here", $read) 

?>
Thanks,

Mike
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

Just escape the meta character with a backslash ... instead of { use \{
just call me the typo master!
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

preg_quote() to escape regex meta characters
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

Hi All again:

The proposed match string does not yeild what I am looking for. It gets a number instead of the item in quotes. I may not have been clear in what I am trying to do so I will post the following items to help you guys see what I am trying to do with the PHP code. Again, the URL presented here gives me the end result, but with someone elses CSS attributes instead of our own... so I want to pull the tags which are what appear when you run the URL below:

<script type="text/javascript" src="[ur ... /nitromike></script>[/url]

now if you just type the http://del.icio.us/feeds/js/tags/nitromike into your browser, you get the JS back.

The first line looks like this:

Code: Select all

(function(){ var ts={"example":1}
It is this code that I want to pull the word "example" and ignore the :1. If I have multiple tags ("example" is my one tag) then I want to quit if there is no more tags when you reach the end "}".

The PHP I am testing right now is:

Code: Select all

<?php
$userId = "nitromike";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

$match = '@var ts=\{"*?\}@';

$tags = preg_match($match, $read);


echo 'url = '.$url.'<br />';
echo 'read = '.$read.'<br />';
echo 'tags = '.$tags;

?>
So my "$tags" variable ought to equal example with my user id.

if someone uses another user's id I have it will equal very many tags... 77 or so.

It would be nice to put each tag into an array element, but I will worry about that after I get the functionality of what I am doing here working :-)

Thanks for all the help!
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

Code: Select all

<?php
$userId = "nitromike";
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);
$match = '@var ts=\{"(.*?)":@';
preg_match($match, $read, $tags);
$tags = $tags[1];
echo 'url = '.$url.'<br />';
echo 'read = '.$read.'<br />';
echo 'tags = '.$tags;
?>
preg_match returns an array in $tags
placing $tags in front of it provides a return code for success or failure
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

Greetings All:

Well, I tried to take rturner's advice and have been working on the script. The array appears to be formed wrong thought print_r() shows most of the elements and ones called "array" throughout, but I can not get the tags back out of the array for printing. Below is the latest of my attempts in the script:

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
$match = '/(\"[a-z0-9]+\")/i';

//run the match and build the tags array
preg_match_all($match, $read, $tags);

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);

echo 'Tag Count is '.$tagCount.'<br />';

$keys = ARRAY_KEYS($tags);
//Process the tags for output
echo '<div class="delicious-tags" id="delicious-tags-'.$userId.'"><ul class="delicious-cloud">';

for($i=0;$i<=$tagCount;$i++){

	//echo '<li><a href="http://del.icio.us/'.$userId.'/'.$value.'>'.$value.'</a></li>';
	echo $tags[($keys[$i])];
}

echo '</ul></div>';

?>
the commented out link in the for loop is supposed to go through each tag and make a link for it. But of course if you echo the $tags array in any form it returns 2 elements "arrayarray" on output. I need to find a way to put the tags into a mechanism that I can print them out and the "array" entries should not be in there.

I am not sure why PHP arrays are so weird,but we may need to find another mechanism to do this... any suggestions??? The idea here is to be able to deal with each tag from the JS file and the match statement seems to yeild the proper amount of tags, but it also appears to be the source of the funky array tags that are added?

The $keys variable was one attempt to try to make this work,but it yeilds the same as $tags[$i].

Hope someone understands what I am trying to do now and can help a bit more. print_r() is not the way I want to get the elements printed out.

Thanks,

Mike
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

Mike,

I'm not sure exactly what you are doing so this may not be what you are looking for ...but

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
$match = '/(\"[a-z0-9]+\")/i';

//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
Each tag is enclosed in quotes, is that your intention?
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

@rturner:

Yeah we want the tags without the quotes,but each tag is in quotes so the new regex made more sense to me.

Let me see if your variation works for me and I'll post back.

Thanks for the help.

Mike
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

@rturner:

Yep works just need to figure out how to get rid of the quotes now.

Thanks,

Mike
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
#$match = '/(\"[a-z0-9]+\")/i';
$match = '/"(.*?)"/';
//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
To remove the quotes
irishmike2004
Forum Contributor
Posts: 119
Joined: Mon Nov 15, 2004 3:54 pm
Location: Lawrence, Kansas

Post by irishmike2004 »

hey rturner:

That gives us more "goo" than we want. The result on my 1 tag is:

$match = '/"(.*?)"/' gives us:

Code: Select all

Tag Count is 9
example
text/css

delicious-tags
delicious-tags-nitromike
delicious-cloud
font-size:15px;line-height:1;
color:rgb('+c[0]+','+c[1]+','+c[2]+')
http://del.icio.us/nitromike/'+encodeURIComponent(t).replace('%2F','/')+'
As opposed to the other regex ($match = '/(\"[a-z0-9]+\")/i') which yeilds this:

Code: Select all

Tag Count is 1
"example"
We get quotes but not the other "goo".
rturner
Forum Newbie
Posts: 24
Joined: Sun Nov 04, 2007 1:39 pm

Post by rturner »

Had to watch the end of the UFC fight :D

Code: Select all

<?php
//Del.icio.us User Name
$userId = "vekann";

//process the JS feed (TagRoll)
$url = "http://del.icio.us/feeds/js/tags/".$userId;
$read = file_get_contents($url);

//Regular Expression we use against the contents of the URL above
#$match = '/(\"[a-z0-9]+\")/i';
$match = '/"(.*?)":/';
//run the match and build the tags array
preg_match_all($match, $read, $tags);
$tags = $tags[1];

//count of the number of elements in the tags array
$tagCount = count($tags, COUNT_RECURSIVE);


echo 'Tag Count is '.$tagCount.'<br />';


foreach ($tags as $tag) {
        print $tag . "<br>";
}
echo '</ul></div>';

?>
That might do it !
Post Reply