Page 1 of 1
Find duplications in XML using DOM
Posted: Wed Sep 27, 2006 3:58 am
by yarons
Hi all,
I am parsing an XML document using the DOMDocument class.
What I need is a piece of code that will find duplicate entries in the XML.
A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
I can think of a code I can create from scratch but is there something built-in n PHP to help me with that?
Many thanks
Posted: Wed Sep 27, 2006 5:02 am
by volka
yarons wrote:A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
Are thoses entries identical in their string representation (.i.e all -including descendants- text nodes concatenated)?
It can probably be done via XPath.
Posted: Wed Sep 27, 2006 5:58 am
by yarons
volka wrote:yarons wrote:A duplicate entry is defined as an entry which shares the same values in 3 specific elements in the XML entry with another entry.
Are thoses entries identical in their string representation (.i.e all -including descendants- text nodes concatenated)?
It can probably be done via XPath.
yes they are. Xpath?
Posted: Wed Sep 27, 2006 6:35 am
by Weirdan
Posted: Wed Sep 27, 2006 6:38 am
by volka
http://www.w3.org/TR/xpath wrote:XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.
For example: selecting all b elements
Code: Select all
<?php
$xml = '<entries>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
<entry>
<a>zyx</a>
<b>wvu</b>
<c>tsr</c>
</entry>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
<entry>
<a>1</a>
<b>2</b>
<c>3</c>
</entry>
</entries>';
$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$nodelist = $xpath->query('//b');
foreach($nodelist as $node) {
echo $node->textContent, "<br />\n";
}
?>
But if explaining XPath is beyond the scope of a forum-post like this, xsl(t) is for sure. It's a vast area of coding.
So I can only provide you with an example here
Code: Select all
<?php
$data = '<entries>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
<entry>
<a>zyx</a>
<b>wvu</b>
<c>tsr</c>
</entry>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
<entry>
<a>1</a>
<b>2</b>
<c>3</c>
</entry>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
<entry>
<a>1</a>
<b>2</b>
<c>3</c>
</entry>
<entry>
<a>abc</a>
<b>def</b>
<c>ghi</c>
</entry>
</entries>';
$style = '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="iso-8859-1" indent="no"/>
<xsl:template match="entry">
<xsl:if test="not( .=preceding::entry )">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>';
$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($style);
$xsl = new XSLTProcessor();
$xsl->importStyleSheet($dom);
$dom = new DOMDocument('1.0', 'iso-8859-1');
$dom->loadXML($data);
echo $xsl->transformToXML($dom); // use transformToDoc() to get a new DOMDocument
?>
You need the
DOM and the
XSL extension for this to work.
http://www.w3.org/ and google can tell you much more about xml, xpath and xsl