Page 1 of 2
XML database experiences
Posted: Fri Jan 14, 2011 6:58 am
by greyhoundcode
Has anyone used an XML database as part of a PHP application, whether XML-enabled or native XML? If so, were there any problems or even advantages beyond your expectations that you would highlight?
Re: XML database experiences
Posted: Fri Jan 14, 2011 8:02 am
by Technical
I guess it can be quite slow. And of course SQL is much more flexible.
Re: XML database experiences
Posted: Fri Jan 14, 2011 8:10 am
by greyhoundcode
Just to clarify I am not talking about keeping data in "plain vanilla" XML files as such, but in a specialized XML database (see
here or
here).
Re: XML database experiences
Posted: Fri Jan 14, 2011 8:46 am
by Benjamin
I can't imagine why anyone in their right mind would want to do this. Per the wikipedia page:
O'Connell (2005, 9.2) gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
This is not a good reason. XML is a PITA to use as a data transport as compared to JSON or serialized arrays. If that's the only reason, there is no reason.
Besides that, it's not scalable.
Re: XML database experiences
Posted: Fri Jan 14, 2011 8:50 am
by Technical
Benjamin wrote:I can't imagine why anyone in their right mind would want to do this. Per the wikipedia page:
O'Connell (2005, 9.2) gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format.
This is not a good reason. XML is a PITA to use as a data transport as compared to JSON or serialized arrays. If that's the only reason, there is no reason.
Besides that, it's not scalable.
Agree, XML is way too overrated and overhyped.
Re: XML database experiences
Posted: Fri Jan 14, 2011 9:00 am
by greyhoundcode
Benjamin wrote:Besides that, it's not scalable.
Wouldn't that depend on how the data is actually stored, and therefore vary by implementation?
Re: XML database experiences
Posted: Fri Jan 14, 2011 9:12 am
by Benjamin
greyhoundcode wrote:Wouldn't that depend on how the data is actually stored, and therefore vary by implementation?
It's stored as XML, so no. XML has considerable overhead, therefore reading through it in order to find "records" would involve significant disk IO, even with relatively small datasets.
Re: XML database experiences
Posted: Fri Jan 14, 2011 9:17 am
by superdezign
I doubt you could find a means of making it more efficient than it's regular counterparts. Also, XML takes longer to parse than, say, a JSON object, and results in larger requests due to the amount of syntax required by the standards. The only time I find myself using XML anymore is with RSS feeds and Flash applications.
Re: XML database experiences
Posted: Fri Jan 14, 2011 2:02 pm
by greyhoundcode
Quite possibly you are correct, but my understanding was that many XML database products (like DB2) do not actually store the XML as XML. If you forget it is XML we are talking about for a moment and just imagine it to be hierarchical data then I'm sure you could envisage how that might be stored in a relational database:
Code: Select all
ID | PARENT | DATA
-----+--------+---------------------------
001 | 000 | Root
002 | 001 | Child of root
003 | 001 | Another child of root
004 | 002 | Child of root's daughter
I'm not suggesting the above is exactly how DB2 or any related software works under the hood, but I wouldn't think an XML database engine would have much to bring to the table if it didn't offer superior performance*, considering that most languages, PHP included, have the ability to load, manipulate and query XML without such a tool.
* That is, superior performance over PHP's bundled XML tools rather than superior performance over a traditional RDBM.
A benefit I see to XML databases is flexibility. The number of children within an element can change according to arising needs (possibly unforeseen initially) without changes to table structures - as there are no tables - and even on a case by case basis, rather than affecting every row (if it were a traditional table based structure).
Re: XML database experiences
Posted: Fri Jan 14, 2011 9:33 pm
by josh
Benjamin wrote:It's stored as XML, so no. XML has considerable overhead, therefore reading through it in order to find "records" would involve significant disk IO, even with relatively small datasets.
The same is true of looping over rows, but that's not how either mysql or XML databases work. (unless you trigger table scans by writing poorly optimized queries). The wikipedia link clearly states an XML database can store XML either as compressed files, relational, object oriented database, etc. I also found this link suggesting mysql is only about 2x as fast as an XML database.
http://bytes.com/topic/db2/answers/1806 ... rge-tables That was with 170 Million rows. Looks like it scales pretty damn good to me. Someone in that thread conjectured DB2 may have a chance of beating out mysql under more realistic load. For example assembling & reading the XML into & out of a mysql table is a slow operation, so if you gave it a fair comparison there's a chance they are no different in terms of performance. Its not far-fetched to assume that mysql is fastest at tabular data, and DB2 is fastest with hierarchal data. After all 1 XML document could easily take thousands of rows to represent in a table.
If you are just loading & unloading XML it seems like an excellent choice, I haven't used any databases other than mysql personally though.
As greyhoundcode also points out, to add a new field in an XML database is instance. On that 170 million row mysql table it'd take a few hours, just for the DDL to run.
Re: XML database experiences
Posted: Sat Jan 15, 2011 1:00 am
by Benjamin
You actually went out and tried to find reasons why this is good? I stand by my original comments. If you think it's scalable, import the 27GB wikipedia XML dump, run some queries on it and tell me how that works out for you.
If your concerned with adding/removing fields then just use the EAV model, even though it really doesn't take as long as you mentioned anyway.
Re: XML database experiences
Posted: Sat Jan 15, 2011 2:33 am
by greyhoundcode
josh wrote:If you are just loading & unloading XML it seems like an excellent choice, I haven't used any databases other than mysql personally though.
Same, largely MySQL is what I use, but for a current project it doesn't feel like a comfortable fit. I was casting about for options and came across the concept of XML databases, which in truth I had not been aware of previously.
Benjamin wrote:If you think it's scalable, import the 27GB wikipedia XML dump, run some queries on it and tell me how that works out for you.
What would you compare the results against?
Re: XML database experiences
Posted: Sat Jan 15, 2011 4:07 am
by josh
" the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at."
http://nativexmldatabase.com/2010/08/22 ... rformance/
Seriously, if you aren't experienced in XML databases how you can knock them? AND criticize me for doing research on them? EAV isn't exactly known for good performance either, for example Magento optimized their slow EAV architecture code by caching the data in a non EAV storage [php's serialize function, combined with a TON of flat files] (pretty much analogous to using an XML database, except an XML database can be queried in a relational manner)
The benefit of an XML database is self evident. Converting associative PHP arrays into a hierarchically structured XML document takes more PHP processing time than simply saying "select *" in an XML database and outputting the raw results. It also takes a lot more programming time. Sure, in mysql you could have 2 tables, one structured and one XML .. but that consumes even more resources keeping the tables synced, whereas DB2 lets you have tabular columns, and XML columns within the same table, and query them with a consistent interface
Seems pretty win to me. I'm actually going to research it further as its an interesting alternative to clumsy, hard to program, hated by the users EAV systems. EAV is for extremely sparse data, not for variable schemas (although its misappropriated for the latter) If you have 1 million records, and half use one set of fields, and half use another set of fields, that doesn't constitute "extremely sparse" and is not a best fit for EAV. EAV also causes an explosion of boiler plate "metadata" tables. Ewww.
For example a patient at a doctor's will have typically a few out of possibly millions of potential diseases. Highly sparse
An e-commerce store will have a million products, but only a hand full 10-20 different "product types". .... Not highly sparse.
An XML database can index "somewhat sparse data" more effectively than an EAV system, as per my understanding (although I have not tried XML databases, I have tried EAV and I can confidently argue my point. EAV is simply not suited to executing queries against.)
Here is a discussion comparing "alter table", eav, and xml databases.
http://dbaspot.com/forums/sqlserver-pro ... s-eav.html
It should be pointed out part of the responsibilities of "google big table" are dealing with sparse data as well, as does couchDB. Both are extremely well looked upon in the community when compared to EAV.
Re: XML database experiences
Posted: Sat Jan 15, 2011 6:41 am
by greyhoundcode
For those who are interested the
community edition of DB2 is available at no cost.
Re: XML database experiences
Posted: Sat Jan 15, 2011 6:54 am
by greyhoundcode
josh wrote:DB2 lets you have tabular columns, and XML columns within the same table, and query them with a consistent interface
Exactly, you need not restrict yourself to either approach.
I don't think anyone would suggest XML databases as being suitable in every domain, but as ever it comes down to finding a good fit (perhaps even a more
natural fit one might say) to a specific problem and this looks like a good fit to a set of problems I have been confronted with.