Page 1 of 1

Using PHP to parse code from between tags

Posted: Sun Jan 22, 2012 12:24 am
by robertjm
Hi all,

I have a podcast xml file which I want to get tag values out of to put in a MySQL database. I've got a working php form which will parse some tags. However, not all. When I tell it to parse the text between <link> and </link> it returns blank fields instead of the actual link to the mp3 file. (example: http://www.recastweb.com/media/Luke/Luke01.mp3)

Also, I've notieced that if I don't include the itunes part of itunes tags (example: explicit instead of itunes:explicit) it works. When I give it the whole tag name <itunes:explicit> it parses nothing. And strangely, it doesn't see the caps in <pubDate> but it will return data when I use <pubdate>.

Do I have to escape the non-alpha characters to get this to work?

Here is the script I am sending the tag info to:

***** PLEASE USE THE PHP CODE TAG *****

Code: Select all

<html>
<head>
<title>Tag Parser</title>

<form name="configform" action="parser.php?action=saveconfig" method="POST">
<fieldset style="width: 650px;">
<legend>Results</legend>

<?php

$con = mysql_connect("xxxxxxxx","xxx","xxx");

if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }
  
mysql_select_db("podadmin", $con);

function getTextBetweenTags($tag, $html, $strict=1)
{
    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    if($strict==1)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);

    /*** the array to return ***/
    $out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }
    /*** return the results ***/
    return $out;
}

$tagtoparse = $_POST['title'];

$xhtml = '<item>
            <title>Habakkuk 3</title>
            <description>Pastor Todd Spitzer teaches from Habakkuk 3</description>
            <link>http://www.recastweb.com/media/Habakkuk/Habakkuk_03.mp3</link>
            <enclosure url="http://www.recastweb.com/media/Habakkuk/Habakkuk_03.mp3" length="46" type="audio/mpeg"  ></enclosure>
            <guid isPermaLink="false">EDB0E7B1-24BE-11DD-98A0-000A9592B578-250-0000004D07B366CF-FFA</guid>
            <pubDate>Sun, 18 May 2008 13:01:35 +0300</pubDate>
            <itunes:subtitle>Habakkuk 3</itunes:subtitle>
            <itunes:summary>Pastor Todd Spitzer teaches from Habakkuk 3</itunes:summary>
            <itunes:duration>45:54</itunes:duration>
            <itunes:keywords>Regeneration, Todd Spitzer, Bible, Habakkuk</itunes:keywords>
            <itunes:author>Todd Spitzer @ Regeneration</itunes:author>
            <itunes:explicit>no</itunes:explicit>
        </item>
        <item>
            <title>Habakkuk 2:4</title>
            <description>Pastor Todd Spitzer teaches from Habakkuk 2:4</description>
            <link>http://www.recastweb.com/media/Habakkuk/Habakkuk_02_04.mp3</link>
            <enclosure url="http://www.recastweb.com/media/Habakkuk/Habakkuk_02_04.mp3" length="49" type="audio/mpeg"  ></enclosure>
            <guid isPermaLink="false">BDF78E36-24BE-11DD-98A0-000A9592B578-250-0000004C68A32FC6-FFA</guid>
            <pubDate>Sun, 18 May 2008 13:01:29 +0300</pubDate>
            <itunes:subtitle>Habakkuk 2:4</itunes:subtitle>
            <itunes:summary>Pastor Todd Spitzer teaches from Habakkuk 2:4</itunes:summary>
            <itunes:duration>48:54</itunes:duration>
            <itunes:keywords>Regeneration, Todd Spitzer, Bible, Habakkuk</itunes:keywords>
            <itunes:author>Todd Spitzer @ Regeneration</itunes:author>
            <itunes:explicit>no</itunes:explicit>
        </item>'
;

$content2 = getTextBetweenTags($tagtoparse, $xhtml, 5);

foreach ($content2 as $item) {
    echo "$item<br />";

$sql="INSERT INTO test (test)
VALUES ('$item')";

if (!mysql_query($sql,$con))
  {
  die('Error: ' . mysql_error());
  }

}
mysql_close($con)
?>

</html>

Re: Using PHP to parse code from between tags

Posted: Sun Jan 22, 2012 3:31 pm
by twinedev
Just something quick that gets the items from the XML, (Description, Link, Publish Date (in mySQL datetime field format) and an array of itunes: items. You should be able to take this and use the data as you want.

Code: Select all

$strCode = // Your XML code....

if (preg_match_all('%<item>.*?</item>%si',$strCode,$regs)) {
	$aryItems = $regs[0];
	foreach($aryItems as $strItem) {
		$strDescription = '';
		$strLink = '';
		$strPubDate = '';
		$aryItuneData = array();

		if (preg_match('%<description>(.*?)</description>%si',$strItem,$regs)) {
			$strDescription = $regs[1];
		}
		if (preg_match('%<link>(.*?)</link>%si',$strItem,$regs)) {
			$strLink = $regs[1];
		}
		if (preg_match('%<pubDate>(.*?)</pubDate>%si',$strItem,$regs)) {
			$strPubDate = date('Y-m-d H:i:s',strtotime($regs[1]));
		}
		if (preg_match_all('%<itunes:([^>]+)>(.*?)</itunes:\1>%si',$strItem,$regs)) {
			foreach($regs[2] as $index=>$val) {
				$aryItuneData[$regs[1][$index]] = $val;
			}
		}

		echo "\n=============== ITEM DATA ==============\n";
		echo "Description: $strDescription\n";
		echo "Link: $strLink\n";
		echo "PubDate: $strPubDate\n";
		echo "iTunes Data:\n";
		foreach($aryItuneData as $key=>$val) {
			echo "\t$key: $val\n";
		}
	}
}

Re: Using PHP to parse code from between tags

Posted: Tue Jan 24, 2012 1:48 am
by robertjm
Thanks for the help!! Working very nicely and I was able to follow your code easily to add pretty much everything I needed to, except when dealing with the <enclosure...> tag. There are three different items within there, and I'm trying to decide whether I can substitute other strang data instead of trying to parse that out of that hodgepodge of a tag.

Also, I had a $64,000 question. I have an xml file that is fairly large, so it would be a pain to paste the whole thing in the .php script, not to mention it's over 255 characters long, which I seem to remember was the length of a variable...Right? Is there a way of having the .php page crawl through a very large .xml file to get tag info from 200+ episodes, or am I just going to have to slice it up, little by little, and import the data that way?

Thanks again for your help,

Robert

Re: Using PHP to parse code from between tags

Posted: Tue Jan 24, 2012 10:05 am
by twinedev
To get the <enclosure> data:

Code: Select all

if (preg_match('%<enclosure url="([^"]*?)" length="([^"]*?)" type="([^"]*?)".*?</enclosure>.%si',$strItem,$regs)) {
    $strURL = $regs[1];
    $intLength = (int)$regs[2];
    $strType = $regs[3];
}
You are mistaken about the 255 limit on a variable (Your own sample code assigns 1608 characters to $xhtml). PHP can handle a lot of data in variables. So the question would be what do you consider "large"? You can try using file_get_contents()

-Greg

Re: Using PHP to parse code from between tags

Posted: Wed Feb 01, 2012 12:57 am
by robertjm
Just seeing your latest post tonight as the notification didn't come. Thanks again for making it simple to resolve my issue! For some reason I thought there was a max length, but perhaps that was in another code language somewhere along the way. I actually tried referencing an external file which had nearly 200 podcast episodes in it, and it flew through that like nothing, so that actually took care of that question.

One thing I was chewing on earlier was how to get the file size (i.e.: length) by looking at the remote file instead of directly from the enclosure data. Unfortunately, the xml file I have has a whole bunch of episodes that have a two character length, which is basically jibberish. I was hoping to define a variable after using a function to check the remote file without having to download a local copy. Found a whole bunch of options online people had said worked, but when I tried them, all I got back was a white page. No error, nothing. I made sure to add an echo line to return a value to the web page. However, that returned nothing.

Later,

Robert

(OT: Not related to the subject, by my Sharks sure beat up on the Blue Jackets tonight!!! :lol: )
----------------------------------------
twinedev wrote:To get the <enclosure> data:

Code: Select all

if (preg_match('%<enclosure url="([^"]*?)" length="([^"]*?)" type="([^"]*?)".*?</enclosure>.%si',$strItem,$regs)) {
    $strURL = $regs[1];
    $intLength = (int)$regs[2];
    $strType = $regs[3];
}
You are mistaken about the 255 limit on a variable (Your own sample code assigns 1608 characters to $xhtml). PHP can handle a lot of data in variables. So the question would be what do you consider "large"? You can try using file_get_contents()

-Greg