Using CONTENT keyword while creating a table with an XML column from XML Schema Collection - sql-server

While creating a table that has an XML type column, I am referring to a complex XML Schema Collection. When I specify the XML Schema, I have the option of mentioning either CONTENT or DOCUMENT keyword. The latter will ensure that the XML data is stored as a document in a single column.
According to a video tutorial the CONTENT will store the XML data in fragments.
Besides the above statement I don't find reference anywhere else regarding the usage of the CONTENT keyword and it's implication on schema & data.
I would like to know how the fragments are created and managed and whether and how they can be queried individually. Further, how the fragments are correlated. Next, when I amend the XML Schema Collection, what is the impact.

actually i think SQLServer 2005 XML is quite good documented.
CONTENT is the default and allows any valid XML. DOCUMENT is more specific and means that the XML-Data you can to store is only allowed to have a single Top-Level node.
Create:
CREATE TABLE XmlCatalog (
ID INT PRIMARY KEY,
Document XML(CONTENT myCollection))
Insert:
INSERT INTO XmlCatalog VALUES (2,
'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')
Select:
SELECT xCol.query('/doc[#id = 123]//section')
FROM XmlCatalog
WHERE xCol.exist ('/doc[#id = 123]') = 1
...and so on. The query language exceeds more or less in a subset of xpath 1.0.
If you amend an XSD it is checked on Inserts and Updates and stored within the xml of each element. As far as i understand the doc it is also allowed to add multiple schemas for one column so that entries can reference to different schemas.
EDIT:
Ok, after reading the specific parts of the documentation i think i understand what your problem is. The reference isn't very clear on that point but as far as i understand it only Entries with one top level node can to be bound to XSD schemas.
Due to the fact that XSD-Schemas require a single top level node defining the used XSD file it won't be possible to validate fragments containing more than one top level element. I haven't tried but i think it can't be done.
However it seems to be valid to define a CONTENT column, amend an XSD and store both, XML with one top level node referencing the XSD as well as XML-fragments which will only checked for wellformedness. The fragments can be accessed using the XPath query language show in the select statement above.
I can't tell you much about performance implications. The reference mentions that XSDs are stored inline so this will need some extra space within the db. The XPath queries need to be executed too. Despite the fact that xpath usually is quite fast i guess it could decrease performace cause it needs to be performed on each row to get the result. To be sure i think you have to check the execution plan for your specific query depending on size and complexity of the stored xml as well as the xpath expression.

Related

Sql Server Xml Column Best Practices

What are the best practices for working with Sql server Xml columns to ensure quick performance and also ease of reporting?
How do you set up the column?
do you leave it as untyped?
or associate it with a schema?
Does associating the xml column with a schema improve query performance?
Our use of xml columns is as follows:
A.> On a PER customer basis we can define flexible storage of their data without overhauling our db.
B.> We need to build reporting views for each customer which returns their data as if it was a simple table (for crystal reports or Sql Server Reporting Services).
The syntax we currently use to query is as follows:
SELECT
Id,
doc.value('#associatedId','nvarchar(40)') as AssocId,
doc.value('#name1', 'nvarchar(255)') as Name1,
doc.value('#name2', 'nvarchar(255)') as Name2,
doc.value('#name3', 'nvarchar(255)') as Name3,
doc.value('#number', 'nvarchar(255)') as Number
From OrderDetails
CROSS APPLY OrderDetails.XmlData.nodes('//root/reviewers/reviewer') as XmlTable(doc)
Is there a quicker way to do this? this query runs slowly for us in a table with 1million records, but only 800 currently have xml data!
Thanks
Pete
From XML Best Practices for Microsoft SQL Server 2005:
Use a typed or untyped XML?
Use untyped XML data type under the
following conditions:
You do not have a schema for your XML data.
You have schemas but you do not want the server to validate the data.
This is sometimes the case when an
application performs client-side
validation before storing the data at
the server, or temporarily stores XML
data invalid according to the schema,
or uses XML schema features not
supported at the server (for example,
key/keyref).
Use typed XML data type under the
following conditions:
You have schemas for your XML data and you want the server to
validate your XML data according on
the XML schemas.
You want to take advantage of storage and query optimizations based
on type information.
You want to take better advantage of type information during
compilation of your queries such as
static type errors.
Typed XML columns, parameters and
variables can store XML documents or
content, which you have to specify as
a flag (DOCUMENT or CONTENT,
respectively) at the time of
declaration. Furthermore, you have to
provide one or more XML schemas.
Specify DOCUMENT if each XML instance
has exactly one top-level element;
otherwise, use CONTENT. The query
compiler uses DOCUMENT flag in type
checks during query compilation to
infer singleton top-level elements.
Does associating the xml column with a schema improve query performance? See above point: use typed XML if you want to take advantage of query optimizations based on type information.
There is also a lengthy discussion over the benefits of XML indexes:
Your application may benefit from an XML index under the following conditions:
Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be taken into account.
Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at runtime and benefits index lookups for efficient query processing.
And most importantly, the appropriate type of secondary XML index for your usage:
If your workload uses path expressions heavily on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of exist() method on XML columns in WHERE clause of Transact-SQL.
If your workload retrieves multiple values from individual XML instances using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its relational primary key value is known.
If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where <author> elements can occur at any level of the hierarchy and the search value ("Howard") is more selective than the path. It also occurs in "wildcard" queries, such as /book [#* = "novel"], where the query looks for <book> elements with some attribute having the value "novel".
If as in the above example you are using the XML to store various string columns, I don't think you would really benefit from typed XML unless you have a need for the server to validate the data. Performance-wise, I suspect it would be faster untyped.
For these kinds of queries you absolutely need to have XML indexes in place, they are essential for good performance of XML queries. Without indexes, XML columns are stored as blobs so in order to query them, SQL needs to shred the blob into XML first, then do whatever operations you are requesting. A primary XML index stores the shredded XML in the database so it doesn't need to be done on the fly. You need to create a primary XML index first, then secondary XML indexes can be created to support your queries.
There are 3 types of secondary XML indexes: PATH, VALUE and PROPERTY. Which secondary indexes you need depends on the type of queries you're going to be doing, so I would encourage you to review the Secondary XML Indexes topic in Books Online to decide which one(s) would be useful to you:
http://msdn.microsoft.com/en-us/library/bb522562(SQL.100).aspx

SQL Server 2008 as a xml document database

Has any one used SQL Server 2008 as an xml document database? what are you thoughts on doing so. Is the indexing and querying of the XML data type sufficent to support this type of role? Is the query performance of XML acceptable?
I don't know what exactly your requirements are, and how many documents and what sizes we're talking about here.
SQL Server 2005 does allow you to specify XML schemas so you can definitely get some validation into the equation which is certainly beneficial.
As for XML indexing - you can index on three different strategies once you've created a basic primary XML index.
the first index type more for optimizing the XPath to a single XML node when you do lots of XPath-based queries for nodes (CREATE XML INDEX ..... FOR PATH)
the second index type more for optimizing access to values inside your XML nodes, when you search more based on values in the XML document (CREATE XML INDEX ..... FOR VALUE)
the third is somewhat of a hybrid of the two above (which I never quite groked myself, to be honest; CREATE XML INDEX ..... FOR PROPERTY)
The XML indexes worked quite well in our samples, but the main drawback in our case was the sheer size of the indexes on disk. Our 1.3 GB database grew to over 11 GB just by adding a PRIMARY XML and a XML FOR PATH index to roughly 45'000 entries with an XML field. Due to disk constraints, we ended up having to take down those indices :-(
This is really not all that surprising considering how the XML index will be built-up with entries for each XML node, attribute and so forth - it's just lots of data.
What we've done in the end is create a number of stored functions that reach into the XML from our Entry table, and we extract those bits and pieces that we need most often. Those are now stored on the Entrytable as computed, persisted properties. This is as fast as "proper" fields on the Entry table, it's always up to date and gets set automatically when new data is inserted, and we hardly ever need to really use any significant XQuery requests anymore.
What I can say from personal experience is that the XML support in SQL Server 2005 is really quite profound and well thought out, in my opinion. So all in all, I would say - go give it a try! You won't really be able to tell whether it works and scales nicely enough in your specific case until you've given it a try.
Marc
I have not tried it, but XML seems a little too verbose to me. I figure I can generate xml from my data later, why worry about storing it in XML.

Designing a Stored procedure to create XML tree

I need to write a Stored procedure in SQL server whose data returned will be used to generate a XML file.
My XML file to be in structure of
<root>
<ANode></ANode>
<BNode></BNode>
<CNode>
<C1Node>
<C11Node></C11Node>
<C12Node></C12Node>
</C1Node>
<C2Node>
<C21Node></C21Node>
<C22Node></C22Node>
</C2Node>
<C3Node>
<C31Node></C31Node>
<C32Node></C32Node>
</C3Node>
</CNode>
</root>
My question is, in the stored procedure we can select values for ANode and BNode as a simple SELECT statement like
Select ANodeVal,BNodeVal from Table
But how to design the stored procedure to get records for the CNode which is a subtree which has 3 or more(dynamic) separate nodes in it for each record in addition to the normal ANode and BNode.
See
Nesting XML-returning scalar valued functions
Once you get the hang of the nesting, and are willing to write the number of scalar-valued functions necessary to construct the node segments from the bottom up (I wouldn't want lots of these laying around), then it's not so hard.
I wouldn't recommend doing this in a stored proc. If created in language such as C#/Python or Java will make the code unit testable and more maintainable.
If you are able to modify the database design, consider keeping each node as a record, instead of as a column (as the sample select statement would indicate).
For example, each row might include the following fields:
RowId
ParentRowId
Name
RowData
I'm assuming that you are passing the data to an application befcause you indicated that the returned data will be used to generate the XML. In which case the Stored Procedure would simply be a SELECT statement, leaving the formatting to the application.
Most implementations of XML engines should allow you to add child nodes to existing parent nodes. The XML is built in memory and then "exported" by whatever method necessary to get the desired final result.

How can I automate exporting of tables into proper XML files from MSSQL or Access?

We have a customer requesting data in XML format. Normally this is not required as we usually just hand off an Access database or csv files and that is sufficient. However in this case I need to automate the exporting of proper XML from a dozen tables.
If I can do it out of SQL Server 2005, that would be preferred. However I can't for the life of me find a way to do this. I can dump out raw xml data but this is just a tag per row with attribute values. We need something that represents the structure of the tables. Access has an export in xml format that meets our needs. However I'm not sure how this can be automated. It doesn't appear to be available in any way through SQL so I'm trying to track down the necessary code to export the XML through a macro or vbscript.
Any suggestions?
Look into using FOR XML AUTO. Depending on your requirements, you might need to use EXPLICIT.
As a quick example:
SELECT
*
FROM
Customers
INNER JOIN Orders ON Orders.CustID = Customers.CustID
FOR XML AUTO
This will generate a nested XML document with the orders inside the customers. You could then use SSIS to export that out into a file pretty easily I would think. I haven't tried it myself though.
If you want a document instead of a fragment, you'll probably need a two-part solution. However, both parts could be done in SQL Server.
It looks from the comments on Tom's entry like you found the ELEMENTS argument, so you're getting the fields as child elements rather than attributes. You'll still end up with a fragment, though, because you won't get a root node.
There are different ways you could handle this. SQL Server provides a method for using XSLT to transform XML documents, so you could create an XSL stylesheet to wrap the result of your query in a root element. You could also add anything else the customer's schema requires (assuming they have one).
If you wanted to leave some fields as attributes and make others elements, you could also use XSLT to move those fields, so you might end up with something like this:
<customer id="204">
<firstname>John</firstname>
<lastname>Public</lastname>
</customer>
There's an outline here of a macro used to export data from an access db to an xml file, which may be of some use to you.
Const acExportTable = 0
Set objAccess = CreateObject("Access.Application")
objAccess.OpenCurrentDatabase "C:\Scripts\Test.mdb"
'Export the table "Inventory" to test.xml
objAccess.ExportXML acExportTable,"Inventory","c:\scripts\test.xml"
The easiest way to do this that I can think of would be to create a small app to do it for you. You could do it as a basic WinForm and then just make use of a LinqToSql dbml class to represent your database. Most of the time you can just serialize those objects using XmlSerializer namespace. Occasionally it is more difficult than that depending on the complexity of your database. Check out this post for some detailed info on LinqToSql and Xml Serialization:
http://www.west-wind.com/Weblog/posts/147218.aspx
Hope that helps.

Querying XML columns in SQLServer 2005

There is a field in my company's "Contacts" table. In that table, there is an XML type column. The column holds misc data about a particular contact. EG.
<contact>
<refno>123456</refno>
<special>a piece of custom data</special>
</contact>
The tags below contact can be different for each contact, and I must query these fragments
alongside the relational data columns in the same table.
I have used constructions like:
SELECT c.id AS ContactID,c.ContactName as ForeName,
c.xmlvaluesn.value('(contact/Ref)[1]', 'VARCHAR(40)') as ref,
INNER JOIN ParticipantContactMap pcm ON c.id=pcm.contactid
AND pcm.participantid=2140
WHERE xmlvaluesn.exist('/contact[Ref = "118985"]') = 1
This method works ok but, it takes a while for the Server to respond.
I have also investigated using the nodes() function to parse the XML nodes and exist() to test if a nodes holds the value I'm searching for.
Does anyone know a better way to query XML columns??
If you are doing one write and a lot of reads, take the parsing hit at write time, and get that data into some format that is more query-able. A first suggestion would be to parse them into a related but separate table, with name/value/contactID columns.
I've found the msdn xml best practices helpful for working with xml blob columns, might provide some inspiration...
http://msdn.microsoft.com/en-us/library/ms345115.aspx#sql25xmlbp_topic4
In addition to the page mentioned by #pauljette, this page has good performance optimization advice:
http://msdn.microsoft.com/en-us/library/ms345118.aspx
There's a lot you can do to speed up the performance of XML queries, but it will never be as good as properly indexed relational data. If you are selecting one document and then querying inside just that one, you can do pretty well, but when your query needs to scan through a bunch of similar documents looking for something, it's sort of like a key lookup in a relational query plan (that is, slow).
If you have a XSD for your Xml then you can import that into your database and you can then build indexes for your Xml data.
Try this
SELECT * FROM conversionupdatelog WHERE
convert(XML,colName).value('(/leads/lead/#LeadID=''xyz#airproducts.com'')[1]', 'varchar(max)')='true'

Resources