Page 1 of 1

Find duplications in XML using DOM

Posted: Wed Sep 27, 2006 3:58 am
by yarons
Hi all,
I am parsing an XML document using the DOMDocument class.
What I need is a piece of code that will find duplicate entries in the XML.
A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
I can think of a code I can create from scratch but is there something built-in n PHP to help me with that?

Many thanks

Posted: Wed Sep 27, 2006 5:02 am
by volka
yarons wrote:A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
Are thoses entries identical in their string representation (.i.e all -including descendants- text nodes concatenated)?
It can probably be done via XPath.

Posted: Wed Sep 27, 2006 5:58 am
by yarons
volka wrote:
yarons wrote:A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
Are thoses entries identical in their string representation (.i.e all -including descendants- text nodes concatenated)?
It can probably be done via XPath.
yes they are. Xpath?

Posted: Wed Sep 27, 2006 6:35 am
by Weirdan
Xpath?
Xpath.

Posted: Wed Sep 27, 2006 6:38 am
by volka
http://www.w3.org/TR/xpath wrote:XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.
For example: selecting all b elements

Code: Select all

<?php
$xml = '<entries>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
		<entry>
			<a>zyx</a>
			<b>wvu</b>
			<c>tsr</c>
		</entry>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
		<entry>
			<a>1</a>
			<b>2</b>
			<c>3</c>
		</entry>
	</entries>';
	

$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($xml);

$xpath = new DOMXPath($dom);
$nodelist = $xpath->query('//b');

foreach($nodelist as $node) {
	echo $node->textContent, "<br />\n";
}
?>
But if explaining XPath is beyond the scope of a forum-post like this, xsl(t) is for sure. It's a vast area of coding.
So I can only provide you with an example here

Code: Select all

<?php
$data = '<entries>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
		<entry>
			<a>zyx</a>
			<b>wvu</b>
			<c>tsr</c>
		</entry>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
		<entry>
			<a>1</a>
			<b>2</b>
			<c>3</c>
		</entry>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
		<entry>
			<a>1</a>
			<b>2</b>
			<c>3</c>
		</entry>
		<entry>
			<a>abc</a>
			<b>def</b>
			<c>ghi</c>
		</entry>
	</entries>';
	
$style = '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
		<xsl:output method="xml" encoding="iso-8859-1" indent="no"/>
		<xsl:template match="entry">
			<xsl:if test="not( .=preceding::entry )">
				<xsl:copy>
					<xsl:apply-templates select="@*|node()"/>
				</xsl:copy>
			</xsl:if>
		</xsl:template>
		
		<xsl:template match="@*|node()">
			<xsl:copy>
				<xsl:apply-templates select="@*|node()"/>
			</xsl:copy>
		</xsl:template>
	</xsl:stylesheet>';


$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($style);
$xsl = new XSLTProcessor();
$xsl->importStyleSheet($dom);

$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($data);
echo $xsl->transformToXML($dom); // use transformToDoc() to get a new DOMDocument
?>
You need the DOM and the XSL extension for this to work.

http://www.w3.org/ and google can tell you much more about xml, xpath and xsl