SQL Server XML Column Performance - sql-server

Converting nText columns which contained XML to the XML data type has resulted in worse performance in SQL Server.
I am currently working on a project where nText columns have been used to store valid XML. I have successfully migrated these columns to the XML data type. However according to SQL Profiler the performance of the XML data type is worse than using nText or nvarchar(max) to store the XML. Everything I have read implies that this should not be the case.
In order to verify this I created two tables with the same indexes etc
Table Name Order1
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [ntext] NOT NULL,
[CustomProperties] [ntext] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...
Table Name Order2
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [xml] NOT NULL,
[CustomProperties] [xml] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...
I have then copied the data using a select/insert statement and rebuilt the indexes on both the tables. I then created a script with the following SQL.
DBCC DROPCLEANBUFFERS
GO
--Part1
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'
DBCC DROPCLEANBUFFERS
GO
--Part2
Select id, uid, AffiliateId, Address, OrderNumber,
CAST(CustomProperties AS xml).query('CustomProperty/Key[text()="AgreedToTerms"]/../Value/text()') as "TermsAgreed"
from Order1
DBCC DROPCLEANBUFFERS
GO
--Part3
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'
DBCC DROPCLEANBUFFERS
GO
-- Part4 This updates a .5M row table.
Update [dbo].[Order1] Set CustomProperties = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX)), Address = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX))
The results average results from the SQL Profiler are as follows:-
NTEXT
+-------+-------------+-------------+-------------+-------------+
| Test | CPU | Reads | Writes | Duration |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 281.3333333 | 129.3333333 | 0 | 933 |
| Part2 | 78421.66667 | 5374306 | 10.66666667 | 47493.66667 |
| Part3 | 281.6666667 | 616 | 27.66666667 | 374.6666667 |
| Part4 | 40312.33333 | 15311252.67 | 320662 | 67010 |
| Total | | | | 115811.3333 |
+-------+-------------+-------------+-------------+-------------+
XML
+-------+-------------+-------------+-------------+-------------+
| Test | CPU | Reads | Writes | Duration |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 282 | 58.33333333 | 0 | 949.3333333 |
| Part2 | 21129.66667 | 180143.3333 | 0 | 76048.66667 |
| Part3 | 297 | 370.3333333 | 14.66666667 | 378 |
| Part4 | 112578.3333 | 8908940.667 | 145703.6667 | 114684.3333 |
| Total | | | | 192060.3333 |
+-------+-------------+-------------+-------------+-------------+
Is the test script flawed? Or is there some other optimisation that needs to be carried out for xml data type columns out side of https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345115(v=sql.90)
I would expect the XML column type to outperform ntext.

So this might not be an answer, at least not a solution, but it will hopefully help to understand what's going on...
The most expensive part with XML is the initial parsing, put in other words: The transformation between the textual representation and the technical storage.
Important to know: Native XML is not stored as the text you see, but as hierachy table. This needs very heavy proecessing when you pass some textual XML into SQL-Server. Calling this XML for a human reader needs the opposite process. Storing this string in a string column (be aware that NTEXT is deprecated for centuries) is faster, than storing it as native XML, but you will lose many advantages.
So to your script:
I assume, that you ran the same script but just changed Order1 to Order2. Is this correct?
Part 1 measures a simple SELECT.
In order to offer a readable representation, SQL-Server (or rather SSMS) will transform any value to some kind of text. If your tables include INTs, GUIDs or a DateTime, you would not see the actual bit patter, would you? SSMS will use quite expensive actions to create something readable for you. The expensive part is the transformation. Strings do not need this, so NTEXT will be faster.
Part 2 measures the .query() method (also in terms of "how to present the result").
Did you use the CAST( AS XML) with Order2 too? However, with such a need XML should be faster, because NTEXT will have to do the heavy parsing over and over, while XML is stored in a queryable format already... But your XQuery is rather sub-optimal (due to the backward navigation ../Value). Try this:
.query('/CustomProperty[Key[text()="AgreedToTerms"]]/Value/text()')
This will look for a <CustomProperty> where there is a <Key> with the given content and will read the <Value> below <CustomProperty> without the need of ../
I'd surely expect XML to outperform NTEXT with a CAST here... The very first call to completely new tables and indexes might return biased results...
Part 3 measures inserts
Here I would expect rather the same performance... If you move a string value into another string column this is simple copying. Moving native XML into another XML column is simple copying too.
Part 4 measures updates
This looks rather weird... What are you trying to achieve? The code needs to tranform your native XMLs to strings and re-parse them to be stored in XML. Doing the same with NTEXT will not need these expensive actions at all...
Some general thougths
If you get some XML from outside, read it from a file and you need to query it just once, string methods on string types can be faster, but: If you want to store XML permanently in order to use and manipulate their values more often, the native XML type will be much better.
In many cases performance measures do not measure what you think you do...
Try to create your tests in a way, that the presentation of the results is not part of the test (e.g. do an INSERT against a temp table, stop the clock and push the output from the temp table)
UPDATE Another test for "Part 2"
Try this test script:
USE master;
GO
CREATE DATABASE testShnugo;
GO
USE testShnugo;
GO
CREATE TABLE dbo.WithString(ID INT,SomeXML NTEXT);
CREATE TABLE dbo.WithXML(ID INT,SomeXML XML);
GO
--insert 100.000 rows to both tables
WITH Tally(Nmbr) AS (SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2)
INSERT INTO dbo.WithXML(ID,SomeXML)
SELECT Nmbr,(SELECT Nmbr AS [#nmbr],CONCAT('hallo',Nmbr) AS [SomeTest/#FindMe],CONCAT('SomeTestValue',Nmbr) As [SomeTest] FOR XML PATH('row'),ROOT('root'),TYPE)
FROM Tally
--copy everything to the second table
INSERT INTO dbo.WithString(ID,SomeXML) SELECT ID,CAST(SomeXML AS NVARCHAR(MAX)) FROM dbo.WithXML;
GO
--check the actual content
SELECT * FROM dbo.WithString;
SELECT * FROM dbo.WithXML;
GO
DECLARE #d DATETIME2=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE SomeXML LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE '
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE CAST(SomeXML AS xml).exist('/root/row[SomeTest[#FindMe="hallo333"]]')=1
PRINT 'CAST NTEXT to XML and .exist()'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE CAST(SomeXML AS nvarchar(MAX)) LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE after CAST XML to NVARCHAR(MAX)'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
SET #d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE SomeXML.exist('/root/row[SomeTest[#FindMe="hallo333"]]')=1
PRINT 'native XML with .exist()'
PRINT DATEDIFF(millisecond,#d,SYSUTCDATETIME());
GO
USE master;
GO
DROP DATABASE testShnugo;
First I create tables and fill them with 100.000 XMLs like this
<root>
<row nmbr="1">
<SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
</row>
</root>
My results
String-Method LIKE
836
CAST NTEXT to XML and .exist()
1962
String-Method LIKE after CAST XML to NVARCHAR(MAX)
1079
native XML with .exist()
911
As expected the fastest approach is a string method against a string type on very tiny strings. But - of course - this will not be as mighty as an elaborated XQuery and will not be able to deal with namspaces, multiple occurances and so on.
The slowest is the cast of NTEXT to XML with .exist()
A string method against the native XML after a cast to string is not that bad actually, but this depends on the XML's size. This one was very tiny...
And 100.000 non-trivial XQuery calls against 100.000 different XMLs is almost as fast as the pure string approach.
UPDATE 2: larger XMLs
I repeated the test with larger XMLs just by changing the code above in one line
SELECT Nmbr,(SELECT TOP 100 Nmbr AS [#nmbr],CONCAT('hallo',x.Nmbr) AS [SomeTest/#FindMe],CONCAT('SomeTestValue',x.Nmbr) As [SomeTest] FROM Tally x FOR XML PATH('row'),ROOT('root'),TYPE)
Now each and any XML will consist of 100 <row> elements.
<root>
<row nmbr="1">
<SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
</row>
<row nmbr="2">
<SomeTest FindMe="hallo2">SomeTestValue2</SomeTest>
</row>
<row nmbr="3">
<SomeTest FindMe="hallo3">SomeTestValue3</SomeTest>
</row>
...more of them
With a search for FindMe="hallo333" this won't return anything, but the time to find, that there is nothing to return, is enough for us:
String-Method LIKE
71959
CAST NTEXT to XML and .exist()
74773
String-Method LIKE after CAST XML to NVARCHAR(MAX)
104380
native XML with .exist()
16374
The fastest - by far! - is now the native XML. The string approaches get lost due to the strings sizes.
Please let me know your result too.

Related

Alternate nodes in a SQL Server XQuery path?

Is there any way to have a sort of "alternates group" like in Regular Expressions, in an XQuery path in SQL Server?
I have this query...
SELECT Q.ROWID QUEUEID, Q.DOCUMENTPACKAGETYPE,
B.R.value('#_ID', 'NAME') PARTYID,
B.R.value('#_BorrowerID', 'NAME') BORROWERID,
B.R.value('#_Name', 'NAME') NAME,
B.R.value('#_EmailAddress', 'NAME') EMAILADDRESS
FROM docutech.QUEUE_EX Q
CROSS APPLY Q.DATA.nodes('LOAN_PUSHBACK_PACKAGE/EVENT_DATA/ESIGN/PARTY') AS B(R)
WHERE Q.REASONFORPUSHBACK = 'DocumentDistribution' AND B.R.value('#_Type', 'NAME') = 'Borrower'
But what I need, is for the CROSS APPLY the ESIGN node in the path can actually be either ESIGN or ECLOSE. So I am looking to do something like the following (thinking in RegEx terms)...
CROSS APPLY Q.DATA.nodes('LOAN_PUSHBACK_PACKAGE/EVENT_DATA/(ESIGN)|(ECLOSE)/PARTY') AS B(R)
Is there any way to do something like this? I'd really hate to have to repeat the same query twice, just for that simple difference, though maybe XQuery doesn't support options like that?
Actually, I just found I can use an asterisk, which will match both, but I'd LIKE to be able to limit it to those known node values if possible. If not, I guess that will do.
I think I got it what you need. Here is conceptual example for you.
The XPath predicate expression is checking that the element names at a particular level belong to a sequence of specified names. The <SomethingElse> element is not a member of the sequence, that's why its data is not retrieved.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<LOAN_PUSHBACK_PACKAGE>
<EVENT_DATA>
<ESIGN>
<PARTY _Name="one"/>
</ESIGN>
<ECLOSE>
<PARTY _Name="two"/>
</ECLOSE>
<SomethingElse>
<PARTY _Name="three"/>
</SomethingElse>
</EVENT_DATA>
</LOAN_PUSHBACK_PACKAGE>');
-- DDL and sample data population, end
SELECT c.value('#_Name','VARCHAR(20)') AS [Name]
FROM #tbl
CROSS APPLY xmldata.nodes('/LOAN_PUSHBACK_PACKAGE/EVENT_DATA/*[local-name(.)=("ESIGN","ECLOSE")]/PARTY') AS t(c);
Output
+------+
| Name |
+------+
| one |
| two |
+------+

How to print out multiple rows of XML values using SQL

I have a database called Companies.
There is a table within Companies called Employees.
Within Employees there is a column that contains an XML response. The Column is called Data.
The XML Response looks like this
<Employee>
<Tenure>7</Tenure>
<Age>55</Age>
<OfficesVisited>
<int>1132</int>
<int>3345</int>
<int>7534</int>
</OfficesVisited>
</Employee>
What I would like my sql query to print out is:
OfficesVisited
1132
3345
7534
What I am currently getting is 113233457534
I am using this sql query:
use Companies
SELECT Employees.Data.query('(/Employee/OfficesVisited/int/text())') as OfficesVisited
FROM Employees
Where Employee.Employee_ID = 65035277
I've tried using OUTER APPLY and CROSS APPLY and I can get it into 3 rows but all three rows look like the above.
Can Anyone help?
Thanks!
Please try the following. It shows how to use .nodes() and .value() XML methods correctly.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, [data] XML);
INSERT INTO #tbl ([data]) VALUES
(N'<Employee>
<Tenure>7</Tenure>
<Age>55</Age>
<OfficesVisited>
<int>1132</int>
<int>3345</int>
<int>7534</int>
</OfficesVisited>
</Employee>')
, (N'<Employee>
<Tenure>77</Tenure>
<Age>58</Age>
<OfficesVisited>
<int>7777</int>
<int>8888</int>
</OfficesVisited>
</Employee>')
, (N'<Employee>
<Tenure>90</Tenure>
<Age>50</Age>
<OfficesVisited>
<int>1111</int>
</OfficesVisited>
</Employee>');
-- DDL and samplex data population, end
SELECT c.value('(./text())[1]','INT') AS OfficesVisited
FROM #tbl AS tbl
CROSS APPLY tbl.[data].nodes('/Employee/OfficesVisited/int') AS t(c)
WHERE id IN (1,2);
Output
+----------------+
| OfficesVisited |
+----------------+
| 1132 |
| 3345 |
| 7534 |
| 7777 |
| 8888 |
+----------------+

How to add data to a single column

I have a question in regards to adding data to a particular column of a table, i had a post yesterday where a user guided me (thanks for that) to what i needed and said an update was the way to go for what i need, but i still can't achieve my goal.
i have two tables, the tables where the information will be added from and the table where the information will be added to, here is an example:
source_table (has only a column called "name_expedient_reviser" that is nvarchar(50))
name_expedient_reviser
kim
randy
phil
cathy
josh
etc.
on the other hand i have the destination table, this one has two columns, one with the ids and the other where the names will be inserted, this column values are null, there are some ids that are going to be used for this.
this is how the other table looks like
dbo_expedient_reviser (has 2 columns, unique_reviser_code numeric PK NOT AI, and name_expedient_reviser who are the users who check expedients this one is set as nvarchar(50)) also this is the way this table is now:
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | NULL
2 | NULL
3 | NULL
4 | NULL
5 | NULL
6 | NULL
what i need is the information of the source_table to be inserted into the row name_expedient_reviser, so the result should look like this
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | kim
2 | randy
3 | phil
4 | cathy
5 | josh
6 | etc.
how can i pass the information into this table? what do i have to do?.
EDIT
the query i saw that should have worked doesn't update which is this one:
UPDATE dbo_expedient_reviser
SET dbo_expedient_reviser.name_expedient_reviser = source_table.name_expedient_reviser
FROM source_table
JOIN dbo_expedient_reviser ON source_table.name_expedient_reviser = dbo_expedient_reviser.name_expedient_reviser
WHERE dbo_expedient_reviser.name_expedient_reviser IS NULL
the query was supposed to update the information into the table, extracting it from the source_table as long as the row name_expedient_reviser is null which it is but is doesn't work.
Since the Names do not have an Id associated with them I would just use ROW_NUMBER and join on ROW_NUMBER = unique_reviser_code. The only problem is, knowing what rows are null. From what I see, they all appear null. In your data, is this the case or are there names sporadically in the table like 5,17,29...etc? If the name_expedient_reviser is empty in dbo_expedient_reviser you could also truncate the table and insert values directly. Hopefully that unique_reviser_code isn't already linked to other things.
WITH CTE (name_expedient_reviser, unique_reviser_code)
AS
(
SELECT name_expedient_reviser
,ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
FROM source_table
)
UPDATE er
SET er.name_expedient_reviser = cte.name_expedient_reviser
FROM dbo_expedient_reviser er
JOIN CTE on cte.unique_reviser_code = er.unique_reviser_code
Or Truncate:
Truncate Table dbo_expedient_reviser
INSERT INTO dbo_expedient_reviser (name_expedient_reviser, unique_reviser_code)
SELECT DISTINCT
unique_reviser_code = ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
,name_expedient_reviser
FROM source_table
it is not posible to INSERT the data into a single column, but to UPDATE and move the data you want is the only way to go in that cases

Stored procedure using table with recipients

I have a table I would like split and emailed to the corresponding staff member of that department, I have two tables, Table 1 contains all the transaction data against the department and is live, Table 2 is static which essentially lists the staff member who is responsible for the each department.
I need to split up table 1 by Department then lookup the email for the corresponding staff member from table2 and send the split table.
Table 1:
| Customer | ? | Department
| Customer | ? | Department1
| Customer | ? | Department2
Table2:
| Department | Staff | Email
| Department1 | Staff1 | Email
| Department2 | Staff2 | Email
I was wondering, would it be possible to create a stored procedure to do this or would I have to create a subscription in SSRS for each individual staff member?
Thanks,
Neil
I would thoroughly recommend making a simple SSRS report and distributing it via a Data Driven Subscription. The queries below will get you started on your data extracts and you can follow a guide here on how to set up an SSRS Data Driven Subscription.
They are very simple to create, you only need the one subscription to send an email to every Department and they are very easy to maintain, even by someone else with no idea what it does.
declare #t1 table(Cust nvarchar(100)
,Cols nvarchar(100)
,Dept nvarchar(100)
)
declare #t2 table(Dept nvarchar(100)
,Staff nvarchar(100)
,Email nvarchar(100)
)
insert into #t1 Values
('Customer','?','Department1')
,('Customer','?','Department2')
,('Customer','?','Department3')
insert into #t2 Values
('Department1','Staff1','Email1')
,('Department2','Staff2','Email2')
,('Department3','Staff3','Email3')
-- Use this query in your Data Driven Subscription to generate the list of Departments and their respective Emails:
select distinct t1.Dept
,t2.Email
from #t1 t1
left join #t2 t2
on(t1.Dept = t2.Dept)
-- Then use this query in your report to list out the contents of Table 1, matching the #SSRSDeptParameter value in the Data Driven Subscription options.
select t1.Cust
,t1.Cols
,t1.Dept
,t2.Email
from #t1 t1
left join #t2 t2
on(t1.Dept = t2.Dept)
where #t1.Dept = #SSRSDeptParameter

How to update records with multiple tables linked by a foreign key using XQuery (or OPENXML) in SQL Server?

I have T-SQL code like this:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- Data for Table 2
SELECT
U.value('fork[1]', 'VARCHAR(8)') Fork,
U.value('spoon[1]', 'VARCHAR(3)') Spoon,
U.value('spork[1]', 'VARCHAR(3)') Spork,
FROM #xml.nodes('xmlnodes/path/nextpath') AS Utensils(U)
Now, I've tried what I normally use, and other variants, such as:
AS XML ON xml.[id-number] = [table1].[id-number]
For the record, id-number is unique across the entire document. It can never occur again.
This is good for grabbing the data from my XML file, but there's zero referential integrity. How do I make sure that Table 2 (and up) maintains referential integrity when inserting?
This should be a much better explanation:
I want to load XML values from a file. For INSERT, I have no trouble using OPENXML and binding it based on the id-number using AS XML ON xml.[id-number] = [table1].[id-number] at the end.
I want to update the database record (with all linked tables and their columns) using UPDATE, MERGE, or something -- anything! To do this, I believe I need to find a way to maintain referential integrity based on the Foreign_ID value present in each table. There are dozens of tables which are all linked via Foreign_ID, so how do I update all of these?
Table Example
Table #1
+-------------+-----------+-----------+------------+---------+-----------+------------+
| Primary_Key | ID_Number | Name | Date | Test | Testing | Testingest |
+-------------+-----------+-----------+------------+---------+-----------+------------|
| 70001 | 12345 | Tom | 01/21/14 | Hi | Yep | Of course! |
| 70002 | 12346 | Dick | 02/22/14 | Bye | No | Never! |
| 70003 | 12347 | Harry | 03/23/14 | Sup | Dunno | Same. |
+----^--------+-----------+-----------+------------+---------+-----------+------------+
|
|-----------------|
|
Table #2 | Linked to primary key in the first table.
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Foreign_ID | Fork | Spoon | Spork |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | Yes | No | No |
| 0002 | 70002 | No | Yes | No |
| 0003 | 70003 | No | No | Yes |
+-------------+-----------------+-------------+-------------+------------+
After that is inserted, I need to be able to UPDATE the tables and columns from the XML files. After much research, I can't figure out how to update the values of every table linked by Foreign_ID while maintaining referential integrity. This means I am inserting the wrong data in the other tables.
I want the correct data updated. To update it correctly, I need to ensure that XQuery is matching the right data. Some tables have multiple fields for one particular Foreign_ID.
Here's the code I'm using:
DECLARE #xml XML = (SELECT CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(Bulk 'C:\test.xml', SINGLE_BLOB) [blah])
-- Data for Table 1
SELECT
ES.value('id-number[1]', 'VARCHAR(8)') IDNumber,
ES.value('name[1]', 'VARCHAR(8)') Name,
ES.value('date[1]', 'VARCHAR(8)') Date,
ES.value('test[1]', 'VARCHAR(3)') Test,
ES.value('testing[1]', 'VARCHAR(511)') Testing,
ES.value('testingest[1]', 'VARCHAR(5)') Testingest
INTO #TempTable
FROM #xml.nodes('xmlnodes/path') AS EfficiencyStatement(ES)
-- #Serial Error: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SET #IDNumber = (SELECT SerialNumber from #TempTable)
SET #Foreign_ID = (SELECT [Foreign_ID] from [table] WHERE [id-number] = #IDNumber)
MERGE dbo.[table1] AS CF
USING (SELECT IDNumber, Name, Date, Test, Testing, Testingest FROM #TempTable) AS src
ON CF.[id-number] = src.IDNumber
-- ID-Number is unique, and is used to setup the initial referential integrity. Foreign_ID does not exist in the XML files, so we are not matching on that.
WHEN MATCHED THEN UPDATE
SET
CF.[id-number] = src.IDNumber
-- and so on...
WHEN NOT MATCHED THEN
-- Insert statements here
GO
This works for the first table. It does not maintain integrity when updating the other tables via Foreign_ID. Note that SET #Serial has an error, but when I set it to anything else, it will update properly.
I am not fully sure what you are asking here, but if you cannot use the suggested article to enforce references in your XML, there is not really a post-op way for you to do it just in XML.
For Table2+ you can do EXISTS checks against TABLE 1 and process accordingly that way (see Referential integrity issue with Untyped XML in TSQL for example)
The only other way that I can think of is to create "real" tables that represent your schema for table 1, table 2...tableN that have the relevant FKs and insert into them.

Resources