What are the best practices for working with Sql server Xml columns to ensure quick performance and also ease of reporting?
How do you set up the column?
do you leave it as untyped?
or associate it with a schema?
Does associating the xml column with a schema improve query performance?
Our use of xml columns is as follows:
A.> On a PER customer basis we can define flexible storage of their data without overhauling our db.
B.> We need to build reporting views for each customer which returns their data as if it was a simple table (for crystal reports or Sql Server Reporting Services).
The syntax we currently use to query is as follows:
SELECT
Id,
doc.value('#associatedId','nvarchar(40)') as AssocId,
doc.value('#name1', 'nvarchar(255)') as Name1,
doc.value('#name2', 'nvarchar(255)') as Name2,
doc.value('#name3', 'nvarchar(255)') as Name3,
doc.value('#number', 'nvarchar(255)') as Number
From OrderDetails
CROSS APPLY OrderDetails.XmlData.nodes('//root/reviewers/reviewer') as XmlTable(doc)
Is there a quicker way to do this? this query runs slowly for us in a table with 1million records, but only 800 currently have xml data!
Thanks
Pete
From XML Best Practices for Microsoft SQL Server 2005:
Use a typed or untyped XML?
Use untyped XML data type under the
following conditions:
You do not have a schema for your XML data.
You have schemas but you do not want the server to validate the data.
This is sometimes the case when an
application performs client-side
validation before storing the data at
the server, or temporarily stores XML
data invalid according to the schema,
or uses XML schema features not
supported at the server (for example,
key/keyref).
Use typed XML data type under the
following conditions:
You have schemas for your XML data and you want the server to
validate your XML data according on
the XML schemas.
You want to take advantage of storage and query optimizations based
on type information.
You want to take better advantage of type information during
compilation of your queries such as
static type errors.
Typed XML columns, parameters and
variables can store XML documents or
content, which you have to specify as
a flag (DOCUMENT or CONTENT,
respectively) at the time of
declaration. Furthermore, you have to
provide one or more XML schemas.
Specify DOCUMENT if each XML instance
has exactly one top-level element;
otherwise, use CONTENT. The query
compiler uses DOCUMENT flag in type
checks during query compilation to
infer singleton top-level elements.
Does associating the xml column with a schema improve query performance? See above point: use typed XML if you want to take advantage of query optimizations based on type information.
There is also a lengthy discussion over the benefits of XML indexes:
Your application may benefit from an XML index under the following conditions:
Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be taken into account.
Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at runtime and benefits index lookups for efficient query processing.
And most importantly, the appropriate type of secondary XML index for your usage:
If your workload uses path expressions heavily on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of exist() method on XML columns in WHERE clause of Transact-SQL.
If your workload retrieves multiple values from individual XML instances using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its relational primary key value is known.
If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where <author> elements can occur at any level of the hierarchy and the search value ("Howard") is more selective than the path. It also occurs in "wildcard" queries, such as /book [#* = "novel"], where the query looks for <book> elements with some attribute having the value "novel".
If as in the above example you are using the XML to store various string columns, I don't think you would really benefit from typed XML unless you have a need for the server to validate the data. Performance-wise, I suspect it would be faster untyped.
For these kinds of queries you absolutely need to have XML indexes in place, they are essential for good performance of XML queries. Without indexes, XML columns are stored as blobs so in order to query them, SQL needs to shred the blob into XML first, then do whatever operations you are requesting. A primary XML index stores the shredded XML in the database so it doesn't need to be done on the fly. You need to create a primary XML index first, then secondary XML indexes can be created to support your queries.
There are 3 types of secondary XML indexes: PATH, VALUE and PROPERTY. Which secondary indexes you need depends on the type of queries you're going to be doing, so I would encourage you to review the Secondary XML Indexes topic in Books Online to decide which one(s) would be useful to you:
http://msdn.microsoft.com/en-us/library/bb522562(SQL.100).aspx
Related
EDIT The XML value is saved in a XML column in SQL server with the entire transaction
I have a general question I suppose regarding the integrity of XML values stored in a SQL Server database.
We are working with very imnportant data elements in regards to healthcare. We currently utilize a BizTalk server that parses very complex looped and segmented files for eligibility and BizTalk parses the file, pushes out an XML "value" does some validation and then pushes it to the data tables.
I have a request from a Director of mine to create a report off of those XML values.
So I have trouble doing this for a couple reasons:
1) I would like to understand what exactly the XML has, does this data retain it's integrity regardless of whether we store the value in a table or store it in the XML?
2) Consistency - Will this data be consistent? Or does the fact that we are looking at XML values over and over using XML values to join the existing table to the XML "table" make the consistency an issue?
3) Accuracy - I would like this data to be accurate and consistent. I guess I'm having a hard time trusting that this data is available in the same form the data in a table is...
Am I being too overcautious here? Or are there valid reasons why this would not be a good idea to create reports for external clients?
Let me know if I can provide anything else, I'm looking for high-level comments, code should be somewhat irrelevant other than we have to use a value in the XML to render other values in the XML for linking purposes.
Off the bat I can think that this may not be consistent in that it's not set up like a DB table. No Primary Key, No Duplicate checks, No Indexing, etc...Is this true also?
Thanks in advance!
I think this article will answer your concerns: http://msdn.microsoft.com/en-us/library/hh403385.aspx
If you are treating a row with an xml column as your grain, the database will keep it transactionally consistent. With the XML type, you can use XML indexes to speed up your queries, which would be an advantage over storing this as varchar(max). Does this answer your question?
I have written an interface to interact with the CubeIQ BlackBox (www.magiclogic.com) and I have an SQL Server 2005 database table where I store some regular text fields, dates, and two pieces of XML in columns that have an XML data type.
There are a few values I want to pick out of the XML, and I cannot figure out how to query the XML, or the attributes of nodes within that XML.
Here is the type of query I want to do:
SELECT
[regular_table_field1],
[regular_table_field2],
[opt_resposne_xml].rootnode.wrapper.child#attr('someattribute') AS 'someattribute',
[opt_resposne_xml].rootnode.wrapper.child#attr('someattribute2') AS 'someattribute2'
FROM [optimizations] WHERE [opt_concept_id] = '1234'
Not knowing what your XML looks like, here's a general purpose piece of code to give you an idea of what it would look like:
SELECT
[regular_table_field1],
[regular_table_field2],
[opt_resposne_xml].value('(/rootnode/wrapper/child/#someattribute)[1]', 'int') AS 'someattribute',
[opt_resposne_xml].value('(/rootnode/wrapper/child/#someattribute2)[1]', 'varchar(50)') AS 'someattribute2'
FROM
[optimizations]
WHERE
[opt_concept_id] = '1234'
If you need to "reach into" the XML column and grab out a single value (from an XML element or attribute), you basically need to define the XPath to that location of where the information you're interested in is located, and you need to convert that to a given SQL Server datatype.
If you need to have a single relational row cross joined against a number of elements from a single XML field, you might need other approaches (CROSS APPLY).
Also: you might need to pay attention to XML namespaces that are involved - a common pitfall when trying to parse XML inside SQL Server.
While creating a table that has an XML type column, I am referring to a complex XML Schema Collection. When I specify the XML Schema, I have the option of mentioning either CONTENT or DOCUMENT keyword. The latter will ensure that the XML data is stored as a document in a single column.
According to a video tutorial the CONTENT will store the XML data in fragments.
Besides the above statement I don't find reference anywhere else regarding the usage of the CONTENT keyword and it's implication on schema & data.
I would like to know how the fragments are created and managed and whether and how they can be queried individually. Further, how the fragments are correlated. Next, when I amend the XML Schema Collection, what is the impact.
actually i think SQLServer 2005 XML is quite good documented.
CONTENT is the default and allows any valid XML. DOCUMENT is more specific and means that the XML-Data you can to store is only allowed to have a single Top-Level node.
Create:
CREATE TABLE XmlCatalog (
ID INT PRIMARY KEY,
Document XML(CONTENT myCollection))
Insert:
INSERT INTO XmlCatalog VALUES (2,
'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')
Select:
SELECT xCol.query('/doc[#id = 123]//section')
FROM XmlCatalog
WHERE xCol.exist ('/doc[#id = 123]') = 1
...and so on. The query language exceeds more or less in a subset of xpath 1.0.
If you amend an XSD it is checked on Inserts and Updates and stored within the xml of each element. As far as i understand the doc it is also allowed to add multiple schemas for one column so that entries can reference to different schemas.
EDIT:
Ok, after reading the specific parts of the documentation i think i understand what your problem is. The reference isn't very clear on that point but as far as i understand it only Entries with one top level node can to be bound to XSD schemas.
Due to the fact that XSD-Schemas require a single top level node defining the used XSD file it won't be possible to validate fragments containing more than one top level element. I haven't tried but i think it can't be done.
However it seems to be valid to define a CONTENT column, amend an XSD and store both, XML with one top level node referencing the XSD as well as XML-fragments which will only checked for wellformedness. The fragments can be accessed using the XPath query language show in the select statement above.
I can't tell you much about performance implications. The reference mentions that XSDs are stored inline so this will need some extra space within the db. The XPath queries need to be executed too. Despite the fact that xpath usually is quite fast i guess it could decrease performace cause it needs to be performed on each row to get the result. To be sure i think you have to check the execution plan for your specific query depending on size and complexity of the stored xml as well as the xpath expression.
Has any one used SQL Server 2008 as an xml document database? what are you thoughts on doing so. Is the indexing and querying of the XML data type sufficent to support this type of role? Is the query performance of XML acceptable?
I don't know what exactly your requirements are, and how many documents and what sizes we're talking about here.
SQL Server 2005 does allow you to specify XML schemas so you can definitely get some validation into the equation which is certainly beneficial.
As for XML indexing - you can index on three different strategies once you've created a basic primary XML index.
the first index type more for optimizing the XPath to a single XML node when you do lots of XPath-based queries for nodes (CREATE XML INDEX ..... FOR PATH)
the second index type more for optimizing access to values inside your XML nodes, when you search more based on values in the XML document (CREATE XML INDEX ..... FOR VALUE)
the third is somewhat of a hybrid of the two above (which I never quite groked myself, to be honest; CREATE XML INDEX ..... FOR PROPERTY)
The XML indexes worked quite well in our samples, but the main drawback in our case was the sheer size of the indexes on disk. Our 1.3 GB database grew to over 11 GB just by adding a PRIMARY XML and a XML FOR PATH index to roughly 45'000 entries with an XML field. Due to disk constraints, we ended up having to take down those indices :-(
This is really not all that surprising considering how the XML index will be built-up with entries for each XML node, attribute and so forth - it's just lots of data.
What we've done in the end is create a number of stored functions that reach into the XML from our Entry table, and we extract those bits and pieces that we need most often. Those are now stored on the Entrytable as computed, persisted properties. This is as fast as "proper" fields on the Entry table, it's always up to date and gets set automatically when new data is inserted, and we hardly ever need to really use any significant XQuery requests anymore.
What I can say from personal experience is that the XML support in SQL Server 2005 is really quite profound and well thought out, in my opinion. So all in all, I would say - go give it a try! You won't really be able to tell whether it works and scales nicely enough in your specific case until you've given it a try.
Marc
I have not tried it, but XML seems a little too verbose to me. I figure I can generate xml from my data later, why worry about storing it in XML.
There is a field in my company's "Contacts" table. In that table, there is an XML type column. The column holds misc data about a particular contact. EG.
<contact>
<refno>123456</refno>
<special>a piece of custom data</special>
</contact>
The tags below contact can be different for each contact, and I must query these fragments
alongside the relational data columns in the same table.
I have used constructions like:
SELECT c.id AS ContactID,c.ContactName as ForeName,
c.xmlvaluesn.value('(contact/Ref)[1]', 'VARCHAR(40)') as ref,
INNER JOIN ParticipantContactMap pcm ON c.id=pcm.contactid
AND pcm.participantid=2140
WHERE xmlvaluesn.exist('/contact[Ref = "118985"]') = 1
This method works ok but, it takes a while for the Server to respond.
I have also investigated using the nodes() function to parse the XML nodes and exist() to test if a nodes holds the value I'm searching for.
Does anyone know a better way to query XML columns??
If you are doing one write and a lot of reads, take the parsing hit at write time, and get that data into some format that is more query-able. A first suggestion would be to parse them into a related but separate table, with name/value/contactID columns.
I've found the msdn xml best practices helpful for working with xml blob columns, might provide some inspiration...
http://msdn.microsoft.com/en-us/library/ms345115.aspx#sql25xmlbp_topic4
In addition to the page mentioned by #pauljette, this page has good performance optimization advice:
http://msdn.microsoft.com/en-us/library/ms345118.aspx
There's a lot you can do to speed up the performance of XML queries, but it will never be as good as properly indexed relational data. If you are selecting one document and then querying inside just that one, you can do pretty well, but when your query needs to scan through a bunch of similar documents looking for something, it's sort of like a key lookup in a relational query plan (that is, slow).
If you have a XSD for your Xml then you can import that into your database and you can then build indexes for your Xml data.
Try this
SELECT * FROM conversionupdatelog WHERE
convert(XML,colName).value('(/leads/lead/#LeadID=''xyz#airproducts.com'')[1]', 'varchar(max)')='true'