[SOLVED] Tag compatibility issue
Posted: Mon Oct 04, 2004 10:39 am
feyd | Please use
The RSSgenr8 code is below.
$pageurl is the page URL passed to the script
```````````````````````````````````````````````````````
feyd | Please use
Code: Select all
tags when posting code. Read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
This concerns a little script called [b]RSSgenr8php[/b].
It can be found here: http://www.xmlhub.com/rssgenr8.php
This is an HTML-to-RSS scraper.
I'm hoping this is a "generic" question as they have no support forums...
The script works like this:
In the page that you'd like to create an RSS feed from, you enclose each RSS "item" with these tags:
[b]<span class="rss:item"></span>[/b]
Then, when you execute the script, it creates a dynamic RSS based upon your tag placement. The script works great, with one exception.
We have a page that is dynamically generated weekly by a cgi script. In that script, we embedded the above tags.
The cgi script requires that the tag looks like this ([b]single[/b] quote):
<span class='rss:item'>
NOTE If double quotes are used in the above cgi script, a parsing error occurs. Single quotes must be used in the cgi script.
And the RSS script (RSSgenr8.php) requires that the tag looks like this ([b]double[/b] quote):
<span class="rss:item">
[b]Question[/b]
Is it possible to make the RSSgenr8 script recognize the single quotes? Or, is it possible to use double quotes in the cgi script without getting a parsing error?
Thank you!
The snippet from the cgi script is below (note single quotes on tag):
```````````````````````````````````````````````````````Code: Select all
$o .=" <span class='rss:item'><a href=$row[page]>$title</a></span>\n";The RSSgenr8 code is below.
$pageurl is the page URL passed to the script
```````````````````````````````````````````````````````
Code: Select all
<?php
if ($pageurl) {
parse_html($pageurl);
} else {
die ("Query failed...");
}
function parse_html($pageurl){
$itemregexp = "%rss:item *" *>(.+?)</span>%is";
$allowable_tags = "<A><B><br /><br><BLOCKQUOTE><CENTER><DD><DL><DT><HR><I><IMG><LI> <OL><P><PRE><U><UL>";
$pageurlparts = parse_url($pageurl);
if ($pageurlparts[path] == "") $pageurl .= "/";
if ($fp = @fopen($pageurl, "r")) {
while (!feof($fp)) {
$data .= fgets($fp, 128);
}
fclose($fp);
}
// print "<pre>";
// print htmlentities($data);
// eregi("<title>(.*)</title>", $data, $title);
// $channel_title = $title[1];
$channel_title = "";
if (preg_match('/<title>(.+?)<\/title>/i', $data, $regs) > 0) { $channel_title = $regs[1];
}
if (preg_match('/<meta .*description.*"(.+?)"/i', $data, $regs) > 0) { $channel_desc = $regs[1];
}
if ($channel_desc == "") $channel_desc = $pageurl;
$match_count = preg_match_all($itemregexp, $data, $items);
$match_count = ($match_count > 25) ? 25 : $match_count;
header("Content-Type: text/xml");
$output .= "<?xml version="1.0" encoding="ISO-8859-1" ?>\n";
$output .= "<!-- generator="rssgenr8/0.92" -->\n";
$output .= "<!DOCTYPE rss PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">\n";
$output .= "<rss version="0.92">\n";
$output .= " <channel>\n";
$output .= " <title>". htmlentities(strip_tags($channel_title)) ."</title>\n";
$output .= " <link>". htmlentities($pageurl) ."</link>\n";
$output .= " <description>". htmlentities($channel_desc) ."</description>\n";
$output .= " <webMaster>". htmlentities("webmaster") ."</webMaster>\n";
$output .= " <generator>". htmlentities("RSSgenr8 from XMLhub.com") ."</generator>\n";
$output .= " <language>en</language>\n";
for ($i=0; $i< $match_count; $i++) {
$desc = $items[1][$i];
$title = wsstrip($desc);
$descout = $desc;
if (preg_match("/(.+?)(?:<\/P|<\/div|<br|<\/h|<\/td)/i", $title, $regs) > 0) {
$title = $regs[1];
if (strlen(wsstrip(trim(strip_tags($title)))) < 100) {
$descout = str_replace($title,"",$descout);
}
}
$title = wsstrip(trim(strip_tags($title)));
if (strlen($title) > 100) {
$title = substr($title,0,100) . " ...";
}
$item_url = get_link($desc, $pageurl);
$descout = wsstrip(strip_tags($descout, $allowable_tags));
$pos = strpos($descout, "<br>");
if (is_int($pos) and ($pos == 0)) {
$descout=substr($descout, 4);
}
$pos = strpos($descout, "<br />");
if (is_int($pos) and ($pos == 0)) {
$descout=substr($descout, 6);
}
$descout = htmlentities(wsstrip($descout));
$output .= " <item>\n";
$output .= " <title>". htmlentities($title) ."</title>\n";
$output .= " <link>". htmlentities($item_url) ."</link>\n";
$output .= " <description>". $descout ."</description>\n";
$output .= " </item>\n";
}
$output .= " </channel>\n";
$output .= "</rss>\n";
print $output;
// print htmlentities($output);
// print "</pre>";
}
function get_link($desc, $pageurl) {
if (stristr($desc, "href")) {
$linkurl = stristr($desc, "href");
$linkurl = substr($linkurl, strpos($linkurl, """)+1);
$linkurl = substr($linkurl, 0, strpos($linkurl, """));
$linkurl = trim($linkurl);
$pageurlarray = parse_url($linkurl);
if (empty($pageurlarray['host'])) {
$linkurl = make_abs($linkurl, $pageurl);
}
return $linkurl;
} else {
return $pageurl;
}
}
function wsstrip($str)
{
$str=ereg_replace("[\r\t\n]"," ",$str);
$str=ereg_replace (' +', ' ', trim($str));
return $str;
}
function make_abs($rel_uri, $base, $REMOVE_LEADING_DOTS = true) {
preg_match("'^([^:]+://[^/]+)/'", $base, $m);
$base_start = $m[1];
if (preg_match("'^/'", $rel_uri)) {
return $base_start . $rel_uri;
}
$base = preg_replace("{[^/]+$}", '', $base);
$base .= $rel_uri;
$base = preg_replace("{^[^:]+://[^/]+}", '', $base);
$base_array = explode('/', $base);
if (count($base_array) and!strlen($base_array[0]))
array_shift($base_array);
$i = 1;
while ($i < count($base_array)) {
if ($base_array[$i - 1] == ".") {
array_splice($base_array, $i - 1, 1);
if ($i > 1) $i--;
} elseif ($base_array[$i] == ".." and $base_array[$i - 1]!= "..") {
array_splice($base_array, $i - 1, 2);
if ($i > 1) {
$i--;
if ($i == count($base_array)) array_push($base_array, "");
}
} else {
$i++;
}
}
if (count($base_array) and $base_array[-1] == ".")
$base_array[-1] = "";
if ($REMOVE_LEADING_DOTS) {
while (count($base_array) and preg_match("/^\.\.?$/", $base_array[0])) {
array_shift($base_array);
}
}
return($base_start . '/' . implode("/", $base_array));
}
?>feyd | Please use
Code: Select all
tags when posting code. Read: [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]