Posted : Sat May 31st
Applet Code Tip
Posted : Thu May 29th
The hidden costs of bad data
Posted : Wed May 28th
Posted : Tue May 27th
Buffers are better
Posted : Thu May 22nd
Feelings nothing more than feelings
Posted : Mon May 19th
A Sense of Scale
Posted : Thu May 15th
Posted : Tue May 13th
No push please
Posted : Sun May 11th
Open wireless network hazards
Posted : Thu May 8th
Posted : Wed May 7th
Posted : Tue May 6th
Font Preview Tool
Posted : Fri May 2nd
XML is not a database
Posted : Thu May 1st
Max in Whose blog is it anyway?
on Mon May 10th
Rob in Whose blog is it anyway?
on Fri May 7th
Anonymous in SEO and the magic beans
on Thu April 8th
Max in SEO and the magic beans
on Thu April 8th
n.o. in SEO and the magic beans
on Thu April 8th
silky in Right way, wrong way
on Fri February 19th
XML is not a database
Posted : Thursday May 1st, 2008
If there's a surer sign of portending doom then a person blathering on about an "XML Database" I couldn't tell you what it would be. I don't know how many times I have seen discussion threads on forums or in newsgroups on database access topics that start with posts including the phrase "XML database" and then do a death spiral from there.
These discussions often have the following flow to them
Some person: Help I am having [problem] with my XML database
Me: Well your first problem is that XML is not a database
Some other person: A text file can be a database so an XML file can be a database
Now don't misunderstand me, XML is great, when used as intended. And largely what XML was intended for was for communication between disparate applications. For example a web service. But using XML as a long-term storage engine is begging for trouble and abusing it as just another database is plain out stupid.
Which brings me to the point of this entry. Why XML is not in fact a database.
First of all we have to define the term database is a useful fashion. Yes you can define a database as any "collection of data" and that would include a text file or an XML file. However. That's not a very useful definition when one begins to talk about programming with a database or using a database to store data for an application or really any use of a database at all. So what's a useful definition? A database is a structured data storage engine.
So what do these stuctured data storage engines have that XML does not?
Really in so many ways the debate ends right here. Databases are designed to provide efficient access to possibly very large sets of data. XML is not designed for efficient access, XML is designed for multiple systems to be able to access it directly.
Databases are designed with tools and functions that help you ensure the integrity of your data. In XML some integrity can be gained by using an appropriate schema but that level of integrity is not the full coverage supplied by a database. For example in a relational database referential integrity can be set up to ensure that values in data set A have a related type in data set b, there is no truly comparative level of integrity available in XML.
What happens if the power goes out in the middle of a database update? What happens if the power goes out in the middle of writing an XML file? For databases ensuring that records are at least not left in a damaged state is part of their job. There is no comparative solution for this in XML. So when the former happens your database recovers as neccessary and you move on, for the latter you recover an old file from a backup, or clean up a file manually, or throw the whole thing out and start over.
Features, Tools and Security
Finally there is whole series of features that seperate using established database products from XML. Things like automated backups and authenticated user security. Also often important tools like server clustering, database migration and replication, triggers and stored procedures. All these tools make databases what they are and their importance can't be discounted. It's interesting to note that outside of performance questions arising from a lack of replication tools and/or triggers.
If you love XML that's great. If you want to interact with your database using XML that's great. But please don't ever think that an XML file "is" a database.
Silky - May 1st 2008 6:22 PM
I like the word portending. One reason I've found myself using XML as a 'database' is for read-only 'data'. I certainly wouldn't be writing to it, but as a quick storage-place for read-only information, it's nice.
Perhaps it can be called a 'datastore' instead of a 'database', but that word seems a little pretentious.
Max - May 2nd 2008 5:42 AM
There is nothing wrong with a properties file. ;)
The problem of course is that for your own use putting things into a properties file be it XML or otherwise works out fine but the problem more comes if you do this in your workplace. And you leave. And someone else comes along who doesn't really know what's what and starts putting more data in there, or reading AND writing.
To be honest the problem is with text files of any kind. I think XML is the main raging abuse leader because it is "structured" which some people think is enough.
Peter - May 12th 2008 6:29 PM
I'm reminded of the first legacy app I maintained in a professional capacity. It used a massive XML file to store game state, and did cross-references all over the place using ids. Absolute nightmare.
Max - May 15th 2008 9:04 PM
Yes it's hard to find a story about using XML as a data storage mechanism that doesn't end with a phrase like "Absolute nightmare".
J - Jan 8th 2009 7:30 PM
I printed your article and used it as ammunition in various meetings at work, following the utter slowdown of servers running safety-critical applications. Slowdown reason: excessive disk IO. The problem was a postgre db of several hundred gigabytes (yikes!) containing... you guessed it... XML. Properly stored, the same db would've been around 2-4 GB. I also did the actual work on a subset of the data, downsizing 8GB of the XML to 210MB of regular tables. I even implemented the necessary software changes and proved that everything from vacuums to regular inserts (which happen dozens of times per second) happen hundreds of times faster.
They decided not to go for my solution. "It's not proven that this would reduce the disk IO", they said. I gave them my numbers, to which the people in charge said "That still doesn't prove much".
Last I checked, they now run the db on a new, separate machine. Nobody cares when that machine is bogged down, and when it is they just don't care if they lose inserts. Thus they did what every company offering enterprise-level solutions: Solve the problem with more hardware, and reduce functionality.
I have since switched to a less idiotic department.
Max - Feb 18th 2009 4:01 PM
Hey J, thanks for the feedback.
Sorry it didn't work out but I'm glad that someone found this almost helpful. ;)
Storing data in XML in the database is almost another nightmare in itself. I've seen this come up on thedailywtf.com a few times as well.