The Data Studio

Loading XML Data to a Database

W3C describes XML as:

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

Since XML carries its metadata with it, it is very useful for documents which have some structure and for short variable messages. As XHTML, XML is the backbone of the world-wide web.

But I cannot understand why many people seem to think that XML is the ideal format to store all kinds of data. I have seen many cases where millions of records, all in exactly the same format, are stored in XML, and this is not a good use of XML.

So how does this affect the database? Most modern relational databases have been persuaded to support XML, so that you can store a whole XML file in one cell in the database. The vendors then supply functions which let you parse the XML data in the database. This is tedious to write and slow to run. The XML hides its metadata inside the XML file so the database cannot join from one element in an XML column to some other data in the database except by applying the XML parsing function to extract the element to join on. It is possible but it is horribly inefficient.

Data in every cell (row/column intersection) should be atomic, otherwise we are wasting the power of the database. An XML file is not atomic.

Here are some better options: