Using PHP to parse code from between tags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
robertjm
Forum Newbie
Posts: 3
Joined: Sun Jan 22, 2012 12:14 am

Using PHP to parse code from between tags

Post by robertjm »

Hi all,

I have a podcast xml file which I want to get tag values out of to put in a MySQL database. I've got a working php form which will parse some tags. However, not all. When I tell it to parse the text between <link> and </link> it returns blank fields instead of the actual link to the mp3 file. (example: http://www.recastweb.com/media/Luke/Luke01.mp3)

Also, I've notieced that if I don't include the itunes part of itunes tags (example: explicit instead of itunes:explicit) it works. When I give it the whole tag name <itunes:explicit> it parses nothing. And strangely, it doesn't see the caps in <pubDate> but it will return data when I use <pubdate>.

Do I have to escape the non-alpha characters to get this to work?

Here is the script I am sending the tag info to:

***** PLEASE USE THE PHP CODE TAG *****

Code: Select all

<html>
<head>
<title>Tag Parser</title>

<form name="configform" action="parser.php?action=saveconfig" method="POST">
<fieldset style="width: 650px;">
<legend>Results</legend>

<?php

$con = mysql_connect("xxxxxxxx","xxx","xxx");

if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }
  
mysql_select_db("podadmin", $con);

function getTextBetweenTags($tag, $html, $strict=1)
{
    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    if($strict==1)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);

    /*** the array to return ***/
    $out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }
    /*** return the results ***/
    return $out;
}

$tagtoparse = $_POST['title'];

$xhtml = '<item>
            <title>Habakkuk 3</title>
            <description>Pastor Todd Spitzer teaches from Habakkuk 3</description>
            <link>http://www.recastweb.com/media/Habakkuk/Habakkuk_03.mp3</link>
            <enclosure url="http://www.recastweb.com/media/Habakkuk/Habakkuk_03.mp3" length="46" type="audio/mpeg"  ></enclosure>
            <guid isPermaLink="false">EDB0E7B1-24BE-11DD-98A0-000A9592B578-250-0000004D07B366CF-FFA</guid>
            <pubDate>Sun, 18 May 2008 13:01:35 +0300</pubDate>
            <itunes:subtitle>Habakkuk 3</itunes:subtitle>
            <itunes:summary>Pastor Todd Spitzer teaches from Habakkuk 3</itunes:summary>
            <itunes:duration>45:54</itunes:duration>
            <itunes:keywords>Regeneration, Todd Spitzer, Bible, Habakkuk</itunes:keywords>
            <itunes:author>Todd Spitzer @ Regeneration</itunes:author>
            <itunes:explicit>no</itunes:explicit>
        </item>
        <item>
            <title>Habakkuk 2:4</title>
            <description>Pastor Todd Spitzer teaches from Habakkuk 2:4</description>
            <link>http://www.recastweb.com/media/Habakkuk/Habakkuk_02_04.mp3</link>
            <enclosure url="http://www.recastweb.com/media/Habakkuk/Habakkuk_02_04.mp3" length="49" type="audio/mpeg"  ></enclosure>
            <guid isPermaLink="false">BDF78E36-24BE-11DD-98A0-000A9592B578-250-0000004C68A32FC6-FFA</guid>
            <pubDate>Sun, 18 May 2008 13:01:29 +0300</pubDate>
            <itunes:subtitle>Habakkuk 2:4</itunes:subtitle>
            <itunes:summary>Pastor Todd Spitzer teaches from Habakkuk 2:4</itunes:summary>
            <itunes:duration>48:54</itunes:duration>
            <itunes:keywords>Regeneration, Todd Spitzer, Bible, Habakkuk</itunes:keywords>
            <itunes:author>Todd Spitzer @ Regeneration</itunes:author>
            <itunes:explicit>no</itunes:explicit>
        </item>'
;

$content2 = getTextBetweenTags($tagtoparse, $xhtml, 5);

foreach ($content2 as $item) {
    echo "$item<br />";

$sql="INSERT INTO test (test)
VALUES ('$item')";

if (!mysql_query($sql,$con))
  {
  die('Error: ' . mysql_error());
  }

}
mysql_close($con)
?>

</html>
User avatar
twinedev
Forum Regular
Posts: 984
Joined: Tue Sep 28, 2010 11:41 am
Location: Columbus, Ohio

Re: Using PHP to parse code from between tags

Post by twinedev »

Just something quick that gets the items from the XML, (Description, Link, Publish Date (in mySQL datetime field format) and an array of itunes: items. You should be able to take this and use the data as you want.

Code: Select all

$strCode = // Your XML code....

if (preg_match_all('%<item>.*?</item>%si',$strCode,$regs)) {
	$aryItems = $regs[0];
	foreach($aryItems as $strItem) {
		$strDescription = '';
		$strLink = '';
		$strPubDate = '';
		$aryItuneData = array();

		if (preg_match('%<description>(.*?)</description>%si',$strItem,$regs)) {
			$strDescription = $regs[1];
		}
		if (preg_match('%<link>(.*?)</link>%si',$strItem,$regs)) {
			$strLink = $regs[1];
		}
		if (preg_match('%<pubDate>(.*?)</pubDate>%si',$strItem,$regs)) {
			$strPubDate = date('Y-m-d H:i:s',strtotime($regs[1]));
		}
		if (preg_match_all('%<itunes:([^>]+)>(.*?)</itunes:\1>%si',$strItem,$regs)) {
			foreach($regs[2] as $index=>$val) {
				$aryItuneData[$regs[1][$index]] = $val;
			}
		}

		echo "\n=============== ITEM DATA ==============\n";
		echo "Description: $strDescription\n";
		echo "Link: $strLink\n";
		echo "PubDate: $strPubDate\n";
		echo "iTunes Data:\n";
		foreach($aryItuneData as $key=>$val) {
			echo "\t$key: $val\n";
		}
	}
}
robertjm
Forum Newbie
Posts: 3
Joined: Sun Jan 22, 2012 12:14 am

Re: Using PHP to parse code from between tags

Post by robertjm »

Thanks for the help!! Working very nicely and I was able to follow your code easily to add pretty much everything I needed to, except when dealing with the <enclosure...> tag. There are three different items within there, and I'm trying to decide whether I can substitute other strang data instead of trying to parse that out of that hodgepodge of a tag.

Also, I had a $64,000 question. I have an xml file that is fairly large, so it would be a pain to paste the whole thing in the .php script, not to mention it's over 255 characters long, which I seem to remember was the length of a variable...Right? Is there a way of having the .php page crawl through a very large .xml file to get tag info from 200+ episodes, or am I just going to have to slice it up, little by little, and import the data that way?

Thanks again for your help,

Robert
User avatar
twinedev
Forum Regular
Posts: 984
Joined: Tue Sep 28, 2010 11:41 am
Location: Columbus, Ohio

Re: Using PHP to parse code from between tags

Post by twinedev »

To get the <enclosure> data:

Code: Select all

if (preg_match('%<enclosure url="([^"]*?)" length="([^"]*?)" type="([^"]*?)".*?</enclosure>.%si',$strItem,$regs)) {
    $strURL = $regs[1];
    $intLength = (int)$regs[2];
    $strType = $regs[3];
}
You are mistaken about the 255 limit on a variable (Your own sample code assigns 1608 characters to $xhtml). PHP can handle a lot of data in variables. So the question would be what do you consider "large"? You can try using file_get_contents()

-Greg
robertjm
Forum Newbie
Posts: 3
Joined: Sun Jan 22, 2012 12:14 am

Re: Using PHP to parse code from between tags

Post by robertjm »

Just seeing your latest post tonight as the notification didn't come. Thanks again for making it simple to resolve my issue! For some reason I thought there was a max length, but perhaps that was in another code language somewhere along the way. I actually tried referencing an external file which had nearly 200 podcast episodes in it, and it flew through that like nothing, so that actually took care of that question.

One thing I was chewing on earlier was how to get the file size (i.e.: length) by looking at the remote file instead of directly from the enclosure data. Unfortunately, the xml file I have has a whole bunch of episodes that have a two character length, which is basically jibberish. I was hoping to define a variable after using a function to check the remote file without having to download a local copy. Found a whole bunch of options online people had said worked, but when I tried them, all I got back was a white page. No error, nothing. I made sure to add an echo line to return a value to the web page. However, that returned nothing.

Later,

Robert

(OT: Not related to the subject, by my Sharks sure beat up on the Blue Jackets tonight!!! :lol: )
----------------------------------------
twinedev wrote:To get the <enclosure> data:

Code: Select all

if (preg_match('%<enclosure url="([^"]*?)" length="([^"]*?)" type="([^"]*?)".*?</enclosure>.%si',$strItem,$regs)) {
    $strURL = $regs[1];
    $intLength = (int)$regs[2];
    $strType = $regs[3];
}
You are mistaken about the 255 limit on a variable (Your own sample code assigns 1608 characters to $xhtml). PHP can handle a lot of data in variables. So the question would be what do you consider "large"? You can try using file_get_contents()

-Greg
Post Reply