XML database experiences

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

XML database experiences

Post by greyhoundcode »

Has anyone used an XML database as part of a PHP application, whether XML-enabled or native XML? If so, were there any problems or even advantages beyond your expectations that you would highlight?
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: XML database experiences

Post by Technical »

I guess it can be quite slow. And of course SQL is much more flexible.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

Just to clarify I am not talking about keeping data in "plain vanilla" XML files as such, but in a specialized XML database (see here or here).
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: XML database experiences

Post by Benjamin »

I can't imagine why anyone in their right mind would want to do this. Per the wikipedia page:
O'Connell (2005, 9.2) gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
This is not a good reason. XML is a PITA to use as a data transport as compared to JSON or serialized arrays. If that's the only reason, there is no reason.

Besides that, it's not scalable.
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: XML database experiences

Post by Technical »

Benjamin wrote:I can't imagine why anyone in their right mind would want to do this. Per the wikipedia page:
O'Connell (2005, 9.2) gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
This is not a good reason. XML is a PITA to use as a data transport as compared to JSON or serialized arrays. If that's the only reason, there is no reason.

Besides that, it's not scalable.
Agree, XML is way too overrated and overhyped.
Last edited by Technical on Fri Jan 14, 2011 9:18 am, edited 1 time in total.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

Benjamin wrote:Besides that, it's not scalable.
Wouldn't that depend on how the data is actually stored, and therefore vary by implementation?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: XML database experiences

Post by Benjamin »

greyhoundcode wrote:Wouldn't that depend on how the data is actually stored, and therefore vary by implementation?
It's stored as XML, so no. XML has considerable overhead, therefore reading through it in order to find "records" would involve significant disk IO, even with relatively small datasets.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: XML database experiences

Post by superdezign »

I doubt you could find a means of making it more efficient than it's regular counterparts. Also, XML takes longer to parse than, say, a JSON object, and results in larger requests due to the amount of syntax required by the standards. The only time I find myself using XML anymore is with RSS feeds and Flash applications.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

Quite possibly you are correct, but my understanding was that many XML database products (like DB2) do not actually store the XML as XML. If you forget it is XML we are talking about for a moment and just imagine it to be hierarchical data then I'm sure you could envisage how that might be stored in a relational database:

Code: Select all

ID   | PARENT | DATA
-----+--------+---------------------------
 001  | 000    | Root
 002  | 001    | Child of root
 003  | 001    | Another child of root
 004  | 002    | Child of root's daughter
I'm not suggesting the above is exactly how DB2 or any related software works under the hood, but I wouldn't think an XML database engine would have much to bring to the table if it didn't offer superior performance*, considering that most languages, PHP included, have the ability to load, manipulate and query XML without such a tool.

* That is, superior performance over PHP's bundled XML tools rather than superior performance over a traditional RDBM.

A benefit I see to XML databases is flexibility. The number of children within an element can change according to arising needs (possibly unforeseen initially) without changes to table structures - as there are no tables - and even on a case by case basis, rather than affecting every row (if it were a traditional table based structure).
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: XML database experiences

Post by josh »

Benjamin wrote:It's stored as XML, so no. XML has considerable overhead, therefore reading through it in order to find "records" would involve significant disk IO, even with relatively small datasets.
The same is true of looping over rows, but that's not how either mysql or XML databases work. (unless you trigger table scans by writing poorly optimized queries). The wikipedia link clearly states an XML database can store XML either as compressed files, relational, object oriented database, etc. I also found this link suggesting mysql is only about 2x as fast as an XML database. http://bytes.com/topic/db2/answers/1806 ... rge-tables That was with 170 Million rows. Looks like it scales pretty damn good to me. Someone in that thread conjectured DB2 may have a chance of beating out mysql under more realistic load. For example assembling & reading the XML into & out of a mysql table is a slow operation, so if you gave it a fair comparison there's a chance they are no different in terms of performance. Its not far-fetched to assume that mysql is fastest at tabular data, and DB2 is fastest with hierarchal data. After all 1 XML document could easily take thousands of rows to represent in a table.

If you are just loading & unloading XML it seems like an excellent choice, I haven't used any databases other than mysql personally though.

As greyhoundcode also points out, to add a new field in an XML database is instance. On that 170 million row mysql table it'd take a few hours, just for the DDL to run.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: XML database experiences

Post by Benjamin »

You actually went out and tried to find reasons why this is good? I stand by my original comments. If you think it's scalable, import the 27GB wikipedia XML dump, run some queries on it and tell me how that works out for you.

If your concerned with adding/removing fields then just use the EAV model, even though it really doesn't take as long as you mentioned anyway.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

josh wrote:If you are just loading & unloading XML it seems like an excellent choice, I haven't used any databases other than mysql personally though.
Same, largely MySQL is what I use, but for a current project it doesn't feel like a comfortable fit. I was casting about for options and came across the concept of XML databases, which in truth I had not been aware of previously.
Benjamin wrote:If you think it's scalable, import the 27GB wikipedia XML dump, run some queries on it and tell me how that works out for you.
What would you compare the results against?
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: XML database experiences

Post by josh »

" the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at."

http://nativexmldatabase.com/2010/08/22 ... rformance/

Seriously, if you aren't experienced in XML databases how you can knock them? AND criticize me for doing research on them? EAV isn't exactly known for good performance either, for example Magento optimized their slow EAV architecture code by caching the data in a non EAV storage [php's serialize function, combined with a TON of flat files] (pretty much analogous to using an XML database, except an XML database can be queried in a relational manner)

The benefit of an XML database is self evident. Converting associative PHP arrays into a hierarchically structured XML document takes more PHP processing time than simply saying "select *" in an XML database and outputting the raw results. It also takes a lot more programming time. Sure, in mysql you could have 2 tables, one structured and one XML .. but that consumes even more resources keeping the tables synced, whereas DB2 lets you have tabular columns, and XML columns within the same table, and query them with a consistent interface

Seems pretty win to me. I'm actually going to research it further as its an interesting alternative to clumsy, hard to program, hated by the users EAV systems. EAV is for extremely sparse data, not for variable schemas (although its misappropriated for the latter) If you have 1 million records, and half use one set of fields, and half use another set of fields, that doesn't constitute "extremely sparse" and is not a best fit for EAV. EAV also causes an explosion of boiler plate "metadata" tables. Ewww.

For example a patient at a doctor's will have typically a few out of possibly millions of potential diseases. Highly sparse

An e-commerce store will have a million products, but only a hand full 10-20 different "product types". .... Not highly sparse.

An XML database can index "somewhat sparse data" more effectively than an EAV system, as per my understanding (although I have not tried XML databases, I have tried EAV and I can confidently argue my point. EAV is simply not suited to executing queries against.)

Here is a discussion comparing "alter table", eav, and xml databases. http://dbaspot.com/forums/sqlserver-pro ... s-eav.html

It should be pointed out part of the responsibilities of "google big table" are dealing with sparse data as well, as does couchDB. Both are extremely well looked upon in the community when compared to EAV.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

For those who are interested the community edition of DB2 is available at no cost.
User avatar
greyhoundcode
Forum Regular
Posts: 613
Joined: Mon Feb 11, 2008 4:22 am

Re: XML database experiences

Post by greyhoundcode »

josh wrote:DB2 lets you have tabular columns, and XML columns within the same table, and query them with a consistent interface
Exactly, you need not restrict yourself to either approach.

I don't think anyone would suggest XML databases as being suitable in every domain, but as ever it comes down to finding a good fit (perhaps even a more natural fit one might say) to a specific problem and this looks like a good fit to a set of problems I have been confronted with.
Post Reply