Has any one used SQL Server 2008 as an xml document database? what are you thoughts on doing so. Is the indexing and querying of the XML data type sufficent to support this type of role? Is the query performance of XML acceptable?
I don't know what exactly your requirements are, and how many documents and what sizes we're talking about here.
SQL Server 2005 does allow you to specify XML schemas so you can definitely get some validation into the equation which is certainly beneficial.
As for XML indexing - you can index on three different strategies once you've created a basic primary XML index.
the first index type more for optimizing the XPath to a single XML node when you do lots of XPath-based queries for nodes (CREATE XML INDEX ..... FOR PATH)
the second index type more for optimizing access to values inside your XML nodes, when you search more based on values in the XML document (CREATE XML INDEX ..... FOR VALUE)
the third is somewhat of a hybrid of the two above (which I never quite groked myself, to be honest; CREATE XML INDEX ..... FOR PROPERTY)
The XML indexes worked quite well in our samples, but the main drawback in our case was the sheer size of the indexes on disk. Our 1.3 GB database grew to over 11 GB just by adding a PRIMARY XML and a XML FOR PATH index to roughly 45'000 entries with an XML field. Due to disk constraints, we ended up having to take down those indices :-(
This is really not all that surprising considering how the XML index will be built-up with entries for each XML node, attribute and so forth - it's just lots of data.
What we've done in the end is create a number of stored functions that reach into the XML from our Entry table, and we extract those bits and pieces that we need most often. Those are now stored on the Entrytable as computed, persisted properties. This is as fast as "proper" fields on the Entry table, it's always up to date and gets set automatically when new data is inserted, and we hardly ever need to really use any significant XQuery requests anymore.
What I can say from personal experience is that the XML support in SQL Server 2005 is really quite profound and well thought out, in my opinion. So all in all, I would say - go give it a try! You won't really be able to tell whether it works and scales nicely enough in your specific case until you've given it a try.
Marc
I have not tried it, but XML seems a little too verbose to me. I figure I can generate xml from my data later, why worry about storing it in XML.
Related
In my stored procedure, I am creating an XML file which has the potential to be very large, > 1GB in size. The data needs to be inserted into a varbinary column and I was wondering what the most efficient method of doing this is in SQL Server 2014?
I was storing it in an xml column but have been asked to move it to this new column as a result of a decision outside of my control
If you have the slightest chance to speak with these persons, you should do this!
You must be aware, that XML is not stored as the string representation you see, but as a hierarchically organized tree. Reading this data or manipulating it is astonishingly fast! If you store the XML as BLOB, you will keep it in its string format (hopefully this is unicode/UCS-2!). Reading this data will need a cast to NVARCHAR(MAX) and then to XML, which means a full parse of the whole document to get the hierarchy tree. When this is done, you can use XML data type methods like .value or .nodes). You will need this very expensive process over and over and over and ...
Especially in cases of huge XMLs (or - even worse - many of them) this is a really bad decision!! Why should one do this??? It will take roughly the same amount of storage space.
The only thing you will get is bad performance! And you will be the one who has to repair this later...
VARBINARY is the appropriate type for data, where you do not care what's inside (e.g. pictures). If these XMLs are just plain archive data and you do not want to read or manipulate them, this can be a choice. But there is no advantage at all!
I would look into using a File Table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
And Check out this for inserting blobs: How to insert a blob into a database using sql server management studio
EDIT The XML value is saved in a XML column in SQL server with the entire transaction
I have a general question I suppose regarding the integrity of XML values stored in a SQL Server database.
We are working with very imnportant data elements in regards to healthcare. We currently utilize a BizTalk server that parses very complex looped and segmented files for eligibility and BizTalk parses the file, pushes out an XML "value" does some validation and then pushes it to the data tables.
I have a request from a Director of mine to create a report off of those XML values.
So I have trouble doing this for a couple reasons:
1) I would like to understand what exactly the XML has, does this data retain it's integrity regardless of whether we store the value in a table or store it in the XML?
2) Consistency - Will this data be consistent? Or does the fact that we are looking at XML values over and over using XML values to join the existing table to the XML "table" make the consistency an issue?
3) Accuracy - I would like this data to be accurate and consistent. I guess I'm having a hard time trusting that this data is available in the same form the data in a table is...
Am I being too overcautious here? Or are there valid reasons why this would not be a good idea to create reports for external clients?
Let me know if I can provide anything else, I'm looking for high-level comments, code should be somewhat irrelevant other than we have to use a value in the XML to render other values in the XML for linking purposes.
Off the bat I can think that this may not be consistent in that it's not set up like a DB table. No Primary Key, No Duplicate checks, No Indexing, etc...Is this true also?
Thanks in advance!
I think this article will answer your concerns: http://msdn.microsoft.com/en-us/library/hh403385.aspx
If you are treating a row with an xml column as your grain, the database will keep it transactionally consistent. With the XML type, you can use XML indexes to speed up your queries, which would be an advantage over storing this as varchar(max). Does this answer your question?
We have got a .Net Client that calls a Webservice. We want to store the result in a SQL Server database.
I think we have two options here how to store the data, and I am a bit undecided as I can't see the pros and cons clearly: One would be to map the results into database fields. That would require us to have database fields corresponding to each possible result type, e.g. for each "normal" result type as well as those for faults.
On the other hand, we could store the resulting XML and query that via the SQL Server built in XML functions.
Personally, I am comfortable with dealing with both SQL and XML, so both look fine to me.
Are there any big pros and cons and what would I need to consider in terms of database design when trying to store the resulting XML for quite a few different possible Webservice operations? I was thinking about a result table for each operation that we call with different entries for the different possible outcomes / types and then store the XML in the right field, e.g. a fault in the fault field, a "normal" return type in the appropriate field etc.
We use a combination of both. XML for reference and detailed data, and text columns for fields you might search on. Searchable columns include order number, customer reference, ticket number. We just add them when we need them since you can extract them from the XML column.
I wouldn't recommend just the XML. If you store 10.000 messages a day, a query like:
select * from XmlLogging with (nolock) where Response like '%Order12%'
can become slow and interfere with other queries. You also can't display the logging in a GUI because retrieval is too slow.
I wouldn't recommend just the text columns either. If the XML format changes, you'd get an empty column. That's hard to troubleshoot without the XML message. In addition, if you need to "replay" the message stream, that's a lot easier with the XML messages. Few requirements demand replay, but it's really helpful when repairing the fallout of production problems.
What are the best practices for working with Sql server Xml columns to ensure quick performance and also ease of reporting?
How do you set up the column?
do you leave it as untyped?
or associate it with a schema?
Does associating the xml column with a schema improve query performance?
Our use of xml columns is as follows:
A.> On a PER customer basis we can define flexible storage of their data without overhauling our db.
B.> We need to build reporting views for each customer which returns their data as if it was a simple table (for crystal reports or Sql Server Reporting Services).
The syntax we currently use to query is as follows:
SELECT
Id,
doc.value('#associatedId','nvarchar(40)') as AssocId,
doc.value('#name1', 'nvarchar(255)') as Name1,
doc.value('#name2', 'nvarchar(255)') as Name2,
doc.value('#name3', 'nvarchar(255)') as Name3,
doc.value('#number', 'nvarchar(255)') as Number
From OrderDetails
CROSS APPLY OrderDetails.XmlData.nodes('//root/reviewers/reviewer') as XmlTable(doc)
Is there a quicker way to do this? this query runs slowly for us in a table with 1million records, but only 800 currently have xml data!
Thanks
Pete
From XML Best Practices for Microsoft SQL Server 2005:
Use a typed or untyped XML?
Use untyped XML data type under the
following conditions:
You do not have a schema for your XML data.
You have schemas but you do not want the server to validate the data.
This is sometimes the case when an
application performs client-side
validation before storing the data at
the server, or temporarily stores XML
data invalid according to the schema,
or uses XML schema features not
supported at the server (for example,
key/keyref).
Use typed XML data type under the
following conditions:
You have schemas for your XML data and you want the server to
validate your XML data according on
the XML schemas.
You want to take advantage of storage and query optimizations based
on type information.
You want to take better advantage of type information during
compilation of your queries such as
static type errors.
Typed XML columns, parameters and
variables can store XML documents or
content, which you have to specify as
a flag (DOCUMENT or CONTENT,
respectively) at the time of
declaration. Furthermore, you have to
provide one or more XML schemas.
Specify DOCUMENT if each XML instance
has exactly one top-level element;
otherwise, use CONTENT. The query
compiler uses DOCUMENT flag in type
checks during query compilation to
infer singleton top-level elements.
Does associating the xml column with a schema improve query performance? See above point: use typed XML if you want to take advantage of query optimizations based on type information.
There is also a lengthy discussion over the benefits of XML indexes:
Your application may benefit from an XML index under the following conditions:
Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be taken into account.
Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at runtime and benefits index lookups for efficient query processing.
And most importantly, the appropriate type of secondary XML index for your usage:
If your workload uses path expressions heavily on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of exist() method on XML columns in WHERE clause of Transact-SQL.
If your workload retrieves multiple values from individual XML instances using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its relational primary key value is known.
If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where <author> elements can occur at any level of the hierarchy and the search value ("Howard") is more selective than the path. It also occurs in "wildcard" queries, such as /book [#* = "novel"], where the query looks for <book> elements with some attribute having the value "novel".
If as in the above example you are using the XML to store various string columns, I don't think you would really benefit from typed XML unless you have a need for the server to validate the data. Performance-wise, I suspect it would be faster untyped.
For these kinds of queries you absolutely need to have XML indexes in place, they are essential for good performance of XML queries. Without indexes, XML columns are stored as blobs so in order to query them, SQL needs to shred the blob into XML first, then do whatever operations you are requesting. A primary XML index stores the shredded XML in the database so it doesn't need to be done on the fly. You need to create a primary XML index first, then secondary XML indexes can be created to support your queries.
There are 3 types of secondary XML indexes: PATH, VALUE and PROPERTY. Which secondary indexes you need depends on the type of queries you're going to be doing, so I would encourage you to review the Secondary XML Indexes topic in Books Online to decide which one(s) would be useful to you:
http://msdn.microsoft.com/en-us/library/bb522562(SQL.100).aspx
There is a field in my company's "Contacts" table. In that table, there is an XML type column. The column holds misc data about a particular contact. EG.
<contact>
<refno>123456</refno>
<special>a piece of custom data</special>
</contact>
The tags below contact can be different for each contact, and I must query these fragments
alongside the relational data columns in the same table.
I have used constructions like:
SELECT c.id AS ContactID,c.ContactName as ForeName,
c.xmlvaluesn.value('(contact/Ref)[1]', 'VARCHAR(40)') as ref,
INNER JOIN ParticipantContactMap pcm ON c.id=pcm.contactid
AND pcm.participantid=2140
WHERE xmlvaluesn.exist('/contact[Ref = "118985"]') = 1
This method works ok but, it takes a while for the Server to respond.
I have also investigated using the nodes() function to parse the XML nodes and exist() to test if a nodes holds the value I'm searching for.
Does anyone know a better way to query XML columns??
If you are doing one write and a lot of reads, take the parsing hit at write time, and get that data into some format that is more query-able. A first suggestion would be to parse them into a related but separate table, with name/value/contactID columns.
I've found the msdn xml best practices helpful for working with xml blob columns, might provide some inspiration...
http://msdn.microsoft.com/en-us/library/ms345115.aspx#sql25xmlbp_topic4
In addition to the page mentioned by #pauljette, this page has good performance optimization advice:
http://msdn.microsoft.com/en-us/library/ms345118.aspx
There's a lot you can do to speed up the performance of XML queries, but it will never be as good as properly indexed relational data. If you are selecting one document and then querying inside just that one, you can do pretty well, but when your query needs to scan through a bunch of similar documents looking for something, it's sort of like a key lookup in a relational query plan (that is, slow).
If you have a XSD for your Xml then you can import that into your database and you can then build indexes for your Xml data.
Try this
SELECT * FROM conversionupdatelog WHERE
convert(XML,colName).value('(/leads/lead/#LeadID=''xyz#airproducts.com'')[1]', 'varchar(max)')='true'