Related
Converting nText columns which contained XML to the XML data type has resulted in worse performance in SQL Server.
I am currently working on a project where nText columns have been used to store valid XML. I have successfully migrated these columns to the XML data type. However according to SQL Profiler the performance of the XML data type is worse than using nText or nvarchar(max) to store the XML. Everything I have read implies that this should not be the case.
In order to verify this I created two tables with the same indexes etc
Table Name Order1
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [ntext] NOT NULL,
[CustomProperties] [ntext] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...
Table Name Order2
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [xml] NOT NULL,
[CustomProperties] [xml] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...
I have then copied the data using a select/insert statement and rebuilt the indexes on both the tables. I then created a script with the following SQL.
DBCC DROPCLEANBUFFERS
GO
--Part1
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'
DBCC DROPCLEANBUFFERS
GO
--Part2
Select id, uid, AffiliateId, Address, OrderNumber,
CAST(CustomProperties AS xml).query('CustomProperty/Key[text()="AgreedToTerms"]/../Value/text()') as "TermsAgreed"
from Order1
DBCC DROPCLEANBUFFERS
GO
--Part3
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'
DBCC DROPCLEANBUFFERS
GO
-- Part4 This updates a .5M row table.
Update [dbo].[Order1] Set CustomProperties = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX)), Address = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX))
The results average results from the SQL Profiler are as follows:-
NTEXT
+-------+-------------+-------------+-------------+-------------+
| Test | CPU | Reads | Writes | Duration |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 281.3333333 | 129.3333333 | 0 | 933 |
| Part2 | 78421.66667 | 5374306 | 10.66666667 | 47493.66667 |
| Part3 | 281.6666667 | 616 | 27.66666667 | 374.6666667 |
| Part4 | 40312.33333 | 15311252.67 | 320662 | 67010 |
| Total | | | | 115811.3333 |
+-------+-------------+-------------+-------------+-------------+
XML
+-------+-------------+-------------+-------------+-------------+
| Test | CPU | Reads | Writes | Duration |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 282 | 58.33333333 | 0 | 949.3333333 |
| Part2 | 21129.66667 | 180143.3333 | 0 | 76048.66667 |
| Part3 | 297 | 370.3333333 | 14.66666667 | 378 |
| Part4 | 112578.3333 | 8908940.667 | 145703.6667 | 114684.3333 |
| Total | | | | 192060.3333 |
+-------+-------------+-------------+-------------+-------------+
Is the test script flawed? Or is there some other optimisation that needs to be carried out for xml data type columns out side of https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345115(v=sql.90)
I would expect the XML column type to outperform ntext.
So this might not be an answer, at least not a solution, but it will hopefully help to understand what's going on...
The most expensive part with XML is the initial parsing, put in other words: The transformation between the textual representation and the technical storage.
Important to know: Native XML is not stored as the text you see, but as hierachy table. This needs very heavy proecessing when you pass some textual XML into SQL-Server. Calling this XML for a human reader needs the opposite process. Storing this string in a string column (be aware that NTEXT is deprecated for centuries) is faster, than storing it as native XML, but you will lose many advantages.
So to your script:
I assume, that you ran the same script but just changed Order1 to Order2. Is this correct?
Part 1 measures a simple SELECT.
In order to offer a readable representation, SQL-Server (or rather SSMS) will transform any value to some kind of text. If your tables include INTs, GUIDs or a DateTime, you would not see the actual bit patter, would you? SSMS will use quite expensive actions to create something readable for you. The expensive part is the transformation. Strings do not need this, so NTEXT will be faster.
Part 2 measures the .query() method (also in terms of "how to present the result").
Did you use the CAST( AS XML) with Order2 too? However, with such a need XML should be faster, because NTEXT will have to do the heavy parsing over and over, while XML is stored in a queryable format already... But your XQuery is rather sub-optimal (due to the backward navigation ../Value). Try this:
.query('/CustomProperty[Key[text()="AgreedToTerms"]]/Value/text()')
This will look for a <CustomProperty> where there is a <Key> with the given content and will read the <Value> below <CustomProperty> without the need of ../
I'd surely expect XML to outperform NTEXT with a CAST here... The very first call to completely new tables and indexes might return biased results...
Part 3 measures inserts
Here I would expect rather the same performance... If you move a string value into another string column this is simple copying. Moving native XML into another XML column is simple copying too.
Part 4 measures updates
This looks rather weird... What are you trying to achieve? The code needs to tranform your native XMLs to strings and re-parse them to be stored in XML. Doing the same with NTEXT will not need these expensive actions at all...
Some general thougths
If you get some XML from outside, read it from a file and you need to query it just once, string methods on string types can be faster, but: If you want to store XML permanently in order to use and manipulate their values more often, the native XML type will be much better.
In many cases performance measures do not measure what you think you do...
Try to create your tests in a way, that the presentation of the results is not part of the test (e.g. do an INSERT against a temp table, stop the clock and push the output from the temp table)
UPDATE Another test for "Part 2"
Try this test script:
USE master;
GO
CREATE DATABASE testShnugo;
GO
USE testShnugo;
GO
CREATE TABLE dbo.WithString(ID INT,SomeXML NTEXT);
CREATE TABLE dbo.WithXML(ID INT,SomeXML XML);
GO
--insert 100.000 rows to both tables
WITH Tally(Nmbr) AS (SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2)
INSERT INTO dbo.WithXML(ID,SomeXML)
SELECT Nmbr,(SELECT Nmbr AS [#nmbr],CONCAT('hallo',Nmbr) AS [SomeTest/#FindMe],CONCAT('SomeTestValue',Nmbr) As [SomeTest] FOR XML PATH('row'),ROOT('root'),TYPE)
FROM Tally
--copy everything to the second table
INSERT INTO dbo.WithString(ID,SomeXML) SELECT ID,CAST(SomeXML AS NVARCHAR(MAX)) FROM dbo.WithXML;
GO
--check the actual content
SELECT * FROM dbo.WithString;
SELECT * FROM dbo.WithXML;
GO
DECLARE #d DATETIME2=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE SomeXML LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE '
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE CAST(SomeXML AS xml).exist('/root/row[SomeTest[#FindMe="hallo333"]]')=1
PRINT 'CAST NTEXT to XML and .exist()'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE CAST(SomeXML AS nvarchar(MAX)) LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE after CAST XML to NVARCHAR(MAX)'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE SomeXML.exist('/root/row[SomeTest[#FindMe="hallo333"]]')=1
PRINT 'native XML with .exist()'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
GO
USE master;
GO
DROP DATABASE testShnugo;
First I create tables and fill them with 100.000 XMLs like this
<root>
<row nmbr="1">
<SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
</row>
</root>
My results
String-Method LIKE
836
CAST NTEXT to XML and .exist()
1962
String-Method LIKE after CAST XML to NVARCHAR(MAX)
1079
native XML with .exist()
911
As expected the fastest approach is a string method against a string type on very tiny strings. But - of course - this will not be as mighty as an elaborated XQuery and will not be able to deal with namspaces, multiple occurances and so on.
The slowest is the cast of NTEXT to XML with .exist()
A string method against the native XML after a cast to string is not that bad actually, but this depends on the XML's size. This one was very tiny...
And 100.000 non-trivial XQuery calls against 100.000 different XMLs is almost as fast as the pure string approach.
UPDATE 2: larger XMLs
I repeated the test with larger XMLs just by changing the code above in one line
SELECT Nmbr,(SELECT TOP 100 Nmbr AS [#nmbr],CONCAT('hallo',x.Nmbr) AS [SomeTest/#FindMe],CONCAT('SomeTestValue',x.Nmbr) As [SomeTest] FROM Tally x FOR XML PATH('row'),ROOT('root'),TYPE)
Now each and any XML will consist of 100 <row> elements.
<root>
<row nmbr="1">
<SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
</row>
<row nmbr="2">
<SomeTest FindMe="hallo2">SomeTestValue2</SomeTest>
</row>
<row nmbr="3">
<SomeTest FindMe="hallo3">SomeTestValue3</SomeTest>
</row>
...more of them
With a search for FindMe="hallo333" this won't return anything, but the time to find, that there is nothing to return, is enough for us:
String-Method LIKE
71959
CAST NTEXT to XML and .exist()
74773
String-Method LIKE after CAST XML to NVARCHAR(MAX)
104380
native XML with .exist()
16374
The fastest - by far! - is now the native XML. The string approaches get lost due to the strings sizes.
Please let me know your result too.
I have a table I would like split and emailed to the corresponding staff member of that department, I have two tables, Table 1 contains all the transaction data against the department and is live, Table 2 is static which essentially lists the staff member who is responsible for the each department.
I need to split up table 1 by Department then lookup the email for the corresponding staff member from table2 and send the split table.
Table 1:
| Customer | ? | Department
| Customer | ? | Department1
| Customer | ? | Department2
Table2:
| Department | Staff | Email
| Department1 | Staff1 | Email
| Department2 | Staff2 | Email
I was wondering, would it be possible to create a stored procedure to do this or would I have to create a subscription in SSRS for each individual staff member?
Thanks,
Neil
I would thoroughly recommend making a simple SSRS report and distributing it via a Data Driven Subscription. The queries below will get you started on your data extracts and you can follow a guide here on how to set up an SSRS Data Driven Subscription.
They are very simple to create, you only need the one subscription to send an email to every Department and they are very easy to maintain, even by someone else with no idea what it does.
declare #t1 table(Cust nvarchar(100)
,Cols nvarchar(100)
,Dept nvarchar(100)
)
declare #t2 table(Dept nvarchar(100)
,Staff nvarchar(100)
,Email nvarchar(100)
)
insert into #t1 Values
('Customer','?','Department1')
,('Customer','?','Department2')
,('Customer','?','Department3')
insert into #t2 Values
('Department1','Staff1','Email1')
,('Department2','Staff2','Email2')
,('Department3','Staff3','Email3')
-- Use this query in your Data Driven Subscription to generate the list of Departments and their respective Emails:
select distinct t1.Dept
,t2.Email
from #t1 t1
left join #t2 t2
on(t1.Dept = t2.Dept)
-- Then use this query in your report to list out the contents of Table 1, matching the #SSRSDeptParameter value in the Data Driven Subscription options.
select t1.Cust
,t1.Cols
,t1.Dept
,t2.Email
from #t1 t1
left join #t2 t2
on(t1.Dept = t2.Dept)
where #t1.Dept = #SSRSDeptParameter
I have a table with the following content
Id | Guid | XmlDefinitionId
1 | 5a0bfc84-13ec-4497-93e0-655e57d4b482 | 1
2 | e28e786b-0856-40b6-8189-0fbd68aa3e45 | 1
And in another table the following XML structure stored:
<ActionActivity DisplayName="DisplayName 1" IsSkipped="False" Id="5a0bfc84-13ec-4497-93e0-655e57d4b482">...</ActionActivity>
<p:Sequence DisplayName="Prerequisites">
<ActionActivity DisplayName="Inner DisplayName 1" IsSkipped="False" Id="e28e786b-0856-40b6-8189-0fbd68aa3e45">...</ActionActivity>
</p:Sequence>
<ActionActivity DisplayName="DisplayName 2" IsSkipped="False" Id="dcc936dd-73c9-43cc-beb4-c636647d4851">...</ActionActivity>
The table containing the XML have the following structure:
Id | XML
1 | (XML Structure defined above here)
Based on the Guid I want to show the displayname. At the moment I have the following query that returns null at the moment. Later I want for every guid from the first table show the displayname.
SELECT
Workflow
,CAST(Workflow as XML).value('data(//ActionActivity[#Id="73c9-43cc-beb4-c636647d4851"])[1]', 'nvarchar(50)') as displayname
FROM SerializedData
Anyone ideas to show the displayname with a sql query?
Assuming that the XML stored in XML typed column, you can do this way -otherwise you'll need to CAST the column to XML- :
SELECT
g.guid, x.display_name
FROM GuidTable g
INNER JOIN
(
SELECT
t.id as 'xml_id'
, c.value('#Id', 'varchar(max)') as 'guid'
, c.value('#DisplayName', 'varchar(max)') as 'display_name'
FROM XmlTable t
CROSS APPLY t.xml.nodes('//ActionActivity') as aa(c)
) x on x.guid = g.guid and x.xml_id = g.xmldefinitionid
Basically, above query shred the XML at ActionActivity node. And then joins shredded data with GuidTable on guid and xmldefinitionid columns.
output :
SQL Fiddle
I have T-SQL code like this:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- Data for Table 2
SELECT
U.value('fork[1]', 'VARCHAR(8)') Fork,
U.value('spoon[1]', 'VARCHAR(3)') Spoon,
U.value('spork[1]', 'VARCHAR(3)') Spork,
FROM #xml.nodes('xmlnodes/path/nextpath') AS Utensils(U)
Now, I've tried what I normally use, and other variants, such as:
AS XML ON xml.[id-number] = [table1].[id-number]
For the record, id-number is unique across the entire document. It can never occur again.
This is good for grabbing the data from my XML file, but there's zero referential integrity. How do I make sure that Table 2 (and up) maintains referential integrity when inserting?
This should be a much better explanation:
I want to load XML values from a file. For INSERT, I have no trouble using OPENXML and binding it based on the id-number using AS XML ON xml.[id-number] = [table1].[id-number] at the end.
I want to update the database record (with all linked tables and their columns) using UPDATE, MERGE, or something -- anything! To do this, I believe I need to find a way to maintain referential integrity based on the Foreign_ID value present in each table. There are dozens of tables which are all linked via Foreign_ID, so how do I update all of these?
Table Example
Table #1
+-------------+-----------+-----------+------------+---------+-----------+------------+
| Primary_Key | ID_Number | Name | Date | Test | Testing | Testingest |
+-------------+-----------+-----------+------------+---------+-----------+------------|
| 70001 | 12345 | Tom | 01/21/14 | Hi | Yep | Of course! |
| 70002 | 12346 | Dick | 02/22/14 | Bye | No | Never! |
| 70003 | 12347 | Harry | 03/23/14 | Sup | Dunno | Same. |
+----^--------+-----------+-----------+------------+---------+-----------+------------+
|
|-----------------|
|
Table #2 | Linked to primary key in the first table.
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Foreign_ID | Fork | Spoon | Spork |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | Yes | No | No |
| 0002 | 70002 | No | Yes | No |
| 0003 | 70003 | No | No | Yes |
+-------------+-----------------+-------------+-------------+------------+
After that is inserted, I need to be able to UPDATE the tables and columns from the XML files. After much research, I can't figure out how to update the values of every table linked by Foreign_ID while maintaining referential integrity. This means I am inserting the wrong data in the other tables.
I want the correct data updated. To update it correctly, I need to ensure that XQuery is matching the right data. Some tables have multiple fields for one particular Foreign_ID.
Here's the code I'm using:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
INTO #TempTable
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- #Serial Error: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SET #IDNumber = (SELECT SerialNumber from #TempTable)
SET #Foreign_ID = (SELECT [Foreign_ID] from [table] WHERE [id-number] = #IDNumber)
MERGE dbo.[table1] AS CF
USING (SELECT IDNumber, Name, Date, Test, Testing, Testingest FROM #TempTable) AS src
ON CF.[id-number] = src.IDNumber
-- ID-Number is unique, and is used to setup the initial referential integrity. Foreign_ID does not exist in the XML files, so we are not matching on that.
WHEN MATCHED THEN UPDATE
SET
CF.[id-number] = src.IDNumber
-- and so on...
WHEN NOT MATCHED THEN
-- Insert statements here
GO
This works for the first table. It does not maintain integrity when updating the other tables via Foreign_ID. Note that SET #Serial has an error, but when I set it to anything else, it will update properly.
I am not fully sure what you are asking here, but if you cannot use the suggested article to enforce references in your XML, there is not really a post-op way for you to do it just in XML.
For Table2+ you can do EXISTS checks against TABLE 1 and process accordingly that way (see Referential integrity issue with Untyped XML in TSQL for example)
The only other way that I can think of is to create "real" tables that represent your schema for table 1, table 2...tableN that have the relevant FKs and insert into them.
I have a flat file source of thousands of records (> 100K in some cases). This source is externally procured, and I cannot request a layout revision.
In this flat file, each row contains four columns:
| User ID | Status_1 | Status_2 | Status_3
| 1337 | Green | Yellow | Red
| 1234 | Red | Red | Green
The destination table was designed to accept two columns:
| User ID | Status Codes
| 1337 | Green
| 1337 | Yellow
| 1337 | Red
| 1234 | Red
| 1234 | Red
| 1234 | Green
Until now, I have been running 3 different SSIS packages to my destination table, one for each Status Column in the Flat File.
What I would like is to use a single SSIS package, and create either another Flat File Destination or Temp Table to mirror the destination table, and import from there.
Is this achievable? If so, what are the best practice Tasks to use, rather than simply UPDATE & SET to the Temp Table.
heh looks like a case for good ole SQL. I would use an UNPIVOT on this one.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
That link has a great example which looks very similar to your data:
--Create the table and insert values as portrayed in the previous example.
CREATE TABLE pvt (VendorID int, Emp1 int, Emp2 int,
Emp3 int, Emp4 int, Emp5 int);
GO
INSERT INTO pvt VALUES (1,4,3,5,4,4);
INSERT INTO pvt VALUES (2,4,1,5,5,5);
INSERT INTO pvt VALUES (3,4,3,5,4,4);
INSERT INTO pvt VALUES (4,4,2,5,5,4);
INSERT INTO pvt VALUES (5,5,1,5,5,5);
GO
--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO
Back when I was data warehousing, half my job seemed like it was using UNPIVOT on crap data I got through spreadsheets.