SQL inserting data based on nested xpath

SQL inserting data based on nested xpath - sql-server

I have a simple insert statement that I am working on. This query inserts a core record, returns an ID, and then inserts a one or more tasks into another table. For each of those tasks, I need to do an additional insert into a 3rd table.
I am having a little trouble trying to figure out how to set this up due to the second insert statement happening before the 3rd, so I am not sure about how to get the identity from each insert to pass to query 3.
DECLARE #requestID INT;
-- Insert the core request details
INSERT INTO esas.Request (Requestor, Justification, CreatedBy, DateCreated)
SELECT #requestor,
#justification,
#creator,
GETUTCDATE()
-- Define the core request ID
SET #requestID = SCOPE_IDENTITY();
-- Add tasks
INSERT INTO esas.Task
( RequestID ,
ToolID ,
QID ,
Action
)
SELECT #requestID,
ParamValues.x1.value('tool[1]', 'INT'),
ParamValues.x1.value('user[1]', 'VARCHAR(10)'),
ParamValues.x1.value('action[1]', 'INT)')
FROM #tasks.nodes('/request/task') AS ParamValues(x1);
-- For each task, add any associated roles (stuck here)
INSERT INTO esas.TaskRoles
( TaskID,
RoleID,
ActionID )
VALUES ( 0, -- TaskID - int
0, -- RoleID - int
0 -- ActionID - int
)
In the last insert, I need to pass the PK/AI the TaskID created from "Add Tasks" to another table, along with the roleID and action from the XML string.
Here is my XML structure:
<request>
<task>
<tool>123</tool>
<user>4567</user>
<roles>
<role>
<roleID>12</roleID>
<action>1</action>
</role>
<role>
<roleID>1245</roleID>
<action>0</action>
</role>
<role>
<roleID>678</roleID>
<action>1</action>
</role>
</roles>
</task>
</request>
My confusion is due to knowing that INSERT INTO esas.Task is going to all happen at once before moving on so I am not sure how to pass each identity to the next insert along with its corresponding details in the XML structure.

I figured this out using xpath and joining the tasks table after the data had been inserted.
-- For each task, add any associated roles
INSERT INTO esas.TaskRoles
( TaskID,
RoleID,
ActionID )
SELECT t.TaskID,
ParamValues.x1.value('roleID[1]', 'INT'),
ParamValues.x1.value('action[1]', 'INT')
FROM esas.Task AS t
JOIN #tasks.nodes('/request/task/roles/role') AS ParamValues(x1)
ON t.RequestID = #requestID
AND t.ToolID = ParamValues.x1.value('../../tool[1]', 'INT')
AND t.QID = ParamValues.x1.value('../../user[1]', 'VARCHAR(10)')

Related

Is there a way to reserve generated ID on a temporary table

I posted this question
INSERT Statement Expensive Queries on Activity Monitor
As you will see the XML structure has different levels.
I have created different tables
Organisation = organisation_id (PRIMARY_KEY)
Contacts = organisation_id (FOREIGN_KEY)
Roles = organisation_id (FOREIGN_KEY)
Rels = organisation_id (FOREIGN_KEY)
Succs = organisation_id (FOREIGN_KEY)
What I want is to generate the organisation_id and do the insert on each table in cascading manner. At the moment the process takes almost 2 hours for 300k. I have 3 approach
Convert XML to List Object and Send by batch(1000) as JSON text and send to a stored procedure the uses OPENJSON
Convert XML to list object and send by batch (1000) and save the batch as JSON a file that SQL Server can read and pass the filepath on a stored procedure which then opens the JSON file using OPENROWSET and OPENJSON
Send the path to XML to a stored procedure then use OPENROWSET and OPENXML.
All process (1-3) inserts the data into a FLAT temp table then iterate each row to call different INSERT stored procedure for each tables. Approach #3 seems to fail with errors on 300k but works on 4 records.
The other question is, will it be much faster if I use an physical table than a temp table?
-------UPDATE-------
As explained on the link, I was doing while loop. Someone suggested / commented to do a batch insert on each of the table. The problem is, for example, Contacts I can only do this if I know the organisation_id
select
organisation_id = IDENTITY( bigint ) -- IF I CAN GENERATE THE ACTUAL ORGANISATION ID
,name = Col.value('.','nvarchar(20)')
,contact_type = c.value('(./#type)[1]','nvarchar(50)')
,contact_value= c.value('(./#value)[1]','nvarchar(50)')
into
#temporganisations
from
#xml.nodes('ns1:OrgRefData/Organisations/Organisation') as Orgs(Col)
outer apply Orgs.Col.nodes('Contacts/Contact') as Cs(c)
Then when I do the batch insert
insert into contacts
(
organisation_id,type,value
)
select
torg.organisation_id -- if this is the actual id then perfect
,torg.type
,torg.value
from #temporg torg

I would suggest that you shred the XML client-side, and switch over to doing some kind of Bulk Copy, this will generally perform much better.
At the moment, you cannot do a normal bcp or SqlBulkCopy, because you also need the foreign key. You need a way to uniquely identify Organisation within the batch, and you say that is difficult owing to the number of columns needed for that.
Instead, you need to generate some kind of unique ID client-side, an incrementing integer will do. You then assign this ID to the child objects as you are shredding the XML into Datatables / IEnumerables / CSV files.
You have two options:
The easiest in many respects, is to not use IDENTITY from OrganisationId and just directly insert your generated ID. This means you can leverage standard SqlBulkCopy procedures.
The downside is that you lose the benefit of automatic IDENTITY assignment, but you could instead just use the SqlBulkCopyOptions.KeepIdentity option which only applies to this insert, and carry on with IDENTITY for other inserts. You would need to estimate a correct batch of IDs that won't clash.
A variation on this is to use GUIDs, these are always unique. I don't really recommend this option.
If you don't want to do this, then it gets quite a bit more complex.
You need to define equivalent Table Types for each of the tables. Each has a column for the temporary primary key of the Organisation
CREATE TYPE OrganisationType AS TABLE
(TempOrganisationID int PRIMARY KEY,
SomeData varchar...
Pass through the shredded XML as Table-Valued Parameters. You would have #Organisations, #Contacts etc.
Then you would have SQL along the following lines:
-- This stores the real IDs
DECLARE #OrganisationIDs TABLE
(TempOrganisationID int PRIMARY KEY, OrganisationId int NOT NULL);
-- We need a hack to get OUTPUT to work with non-inserted columns, so we use a weird MERGE
MERGE INTO Organisation t
USING #Organisations s
ON 1 = 0 -- never match
WHEN NOT MATCHED THEN
INSERT (SomeData, ...)
VALUES (s.SomeData, ...)
OUTPUT
s.TempOrganisationID, inserted.OrganisationID
INTO #OrganisationIDs
(TempOrganisationID, OrganisationID);
-- We now have each TempOrganisationID matched up with a real OrganisationID
-- Now we can insert the child tables
INSERT Contact
(OrganisationID, [Type], [Value]...)
SELECT o.OrganisationID, c.[Type], c.[Value]
FROM #Contact c
JOIN #OrganisationIDs o ON o.TempOrganisationID = c.TempOrganisationID;
-- and so on for all the child tables
Instead of saving the IDs to a table variable, you could instead stream back the OUTPUT to client, and have the client join the IDs to the child tables, then BulkCopy them back again as part of the child tables.
This makes the SQL simpler, however you still need the MERGE, and you risk complicating the client code significantly.

You can try to use the following conceptual example.
SQL
-- DDL and sample data population, start
USE tempdb;
GO
DROP TABLE IF EXISTS #city;
DROP TABLE IF EXISTS #state;
-- parent table
CREATE TABLE #state (
stateID INT IDENTITY PRIMARY KEY,
stateName VARCHAR(30),
abbr CHAR(2),
capital VARCHAR(30)
);
-- child table (1-to-many)
CREATE TABLE #city (
cityID INT IDENTITY,
stateID INT NOT NULL FOREIGN KEY REFERENCES #state(stateID),
city VARCHAR(30),
[population] INT,
PRIMARY KEY (cityID, stateID, city)
);
-- mapping table to preserve IDENTITY ids
DECLARE #idmapping TABLE (GeneratedID INT PRIMARY KEY,
NaturalID VARCHAR(20) NOT NULL UNIQUE);
DECLARE #xml XML =
N'<root>
<state>
<StateName>Florida</StateName>
<Abbr>FL</Abbr>
<Capital>Tallahassee</Capital>
<cities>
<city>
<city>Miami</city>
<population>470194</population>
</city>
<city>
<city>Orlando</city>
<population>285713</population>
</city>
</cities>
</state>
<state>
<StateName>Texas</StateName>
<Abbr>TX</Abbr>
<Capital>Austin</Capital>
<cities>
<city>
<city>Houston</city>
<population>2100263</population>
</city>
<city>
<city>Dallas</city>
<population>5560892</population>
</city>
</cities>
</state>
</root>';
-- DDL and sample data population, end
;WITH rs AS
(
SELECT stateName = p.value('(StateName/text())[1]', 'VARCHAR(30)'),
abbr = p.value('(Abbr/text())[1]', 'CHAR(2)'),
capital = p.value('(Capital/text())[1]', 'VARCHAR(30)')
FROM #xml.nodes('/root/state') AS t(p)
)
MERGE #state AS o
USING rs ON 1 = 0
WHEN NOT MATCHED THEN
INSERT(stateName, abbr, capital)
VALUES(rs.stateName, rs.Abbr, rs.Capital)
OUTPUT inserted.stateID, rs.stateName
INTO #idmapping (GeneratedID, NaturalID);
;WITH Details AS
(
SELECT NaturalID = p.value('(StateName/text())[1]', 'VARCHAR(30)'),
city = c.value('(city/text())[1]', 'VARCHAR(30)'),
[population] = c.value('(population/text())[1]', 'INT')
FROM #xml.nodes('/root/state') AS A(p) -- parent
CROSS APPLY A.p.nodes('cities/city') AS B(c) -- child
)
INSERT #city (stateID, city, [Population])
SELECT m.GeneratedID, d.city, d.[Population]
FROM Details AS d
INNER JOIN #idmapping AS m ON d.NaturalID = m.NaturalID;
-- test
SELECT * FROM #state;
SELECT * FROM #idmapping;
SELECT * FROM #city;

How do I get inserted row Id, along with related data back after insert without inserting the related data

I have a set of data which need to result in a new row in a table. Once this row is created I need to attach metadata in separate tables related to this information. That is I need to create my [Identity] first, get the GlobalId back from the row, and then attach [Accounts] and [Metadata] to it.
Inserting data and getting the Id of the inserted row is easy enough (see query below). But I'm stumped as to how I get the personnumber, firstname, and lastname inserted into this temporary table as well so I can continue with inserting the related data.
DECLARE #temp AS TABLE(
[GlobalId] BIGINT
,[Personnumber] NVARCHAR(100)
,[Firstname] NVARCHAR(100)
,[Lastname] NVARCHAR(100)
);
;WITH person AS
(
SELECT top 1
t.[Personnumber]
,t.[Firstname]
,t.[Lastname]
FROM [temp].[RawRoles] t
WHERE t.Personnumber NOT IN
(
SELECT i.Account FROM [security].[Accounts] i
)
)
INSERT INTO [security].[Identities] ([Created], [Updated])
-- how do i get real related values here and not my hard coded strings?
OUTPUT inserted.GlobalId, 'personnumber', 'firstname', 'lastname' INTO #temp
SELECT GETUTCDATE(), GETUTCDATE()
FROM person
P.S. Backstory.
Identities for me is just a holder of a global Id we will be using instead of actual personal numbers (equivalent of social security numbers) in other systems, this way only one location has sensitive numbers, and can relate multiple account identifications such as social security number or AD accounts to the same global id.
P.P.S I would prefer to avoid Cursors as the query is going to be moving around almost 2 million records on first run, and several thousand on a daily basis.

#PeterHe gave me an idea on how to solve this with MERGE
Got it working as follows. When all rows have been inserted I can query #temp to continue the rest of the inserts.
DECLARE #temp AS TABLE(
[action] NVARCHAR(20)
,[GlobalId] BIGINT
,[Personnumber] NVARCHAR(100)
,[Firstname] NVARCHAR(100)
,[Lastname] NVARCHAR(100)
);
;WITH person AS
(
SELECT top 1
t.[Personnumber]
,t.[Firstname]
,t.[Lastname]
FROM [temp].[RawRoles] t
WHERE t.Personnumber NOT IN
(
SELECT i.Account FROM [security].[Accounts] i
)
)
MERGE [security].[Identities] AS tar
USING person AS src
ON 0 = 1 -- all rows from src need to be inserted, ive already filtered out using CTE Query.
WHEN NOT MATCHED THEN
INSERT
(
[Created], [Updated]
)
VALUES
(
GETUTCDATE(), GETUTCDATE()
)
OUTPUT $action, inserted.GlobalId, src.[Personnumber], src.[Firstname], src.[Lastname] INTO #temp;
SELECT * FROM #temp

Updating and Inserting to 2 different tables from the Same package

I have 2 tables called Customer and ChangeLog. having the following structure
Customer table
ChangeLog Table
My Requirement is that
I need an SSIS Package that will read the record from another table with the same structure as CustomerTable and then compare the rows on both tables. If a change in any record is found it updates the records in the customer table as well as put an entry in the ChangeLog saying which column was updated.
So when a change is found in any of the columns I need to do the following
Update the Coresposing record in the Customer Table
Insert a new row into the ChangeLog
There won't be an Insert to the Customer Table. There will be only updates
Is there any single Task in SSIS that I can use to do both the update as well as an insert to these different tables ? or else what is the quickest and efficient way to achieve this in SSIS?
Any help is much appreciated

No there is no single SSIS task made to do this. I wouldn't use SSIS for this at all. Put the logic in either a stored procedure or trigger. If you have to use SSIS for some reason, then have SSIS call the stored procedure, or UPDATE the table and let the trigger fire.

This here is better than a SSIS packages since you can use a trigger to detect your row changes, and even the values.
Try my example you can just C/P into management studio. When you update on Sample_Table you will have changes rows and which column in your table.
So what you can do is. Keep your lookup logic in SSIS (if you want something in SSIS) - Updated the Table based on matches in lookup
When these updates happend your trigger will be fired and update the rows that have changed.
Alternative you can create your lookup in a t-sql script and do an ordinary update when custid=custid instead its just as easy. But thats up to you.
EDITED
-- -------------------- Setup tables and some initial data --------------------
CREATE TABLE dbo.Sample_Table (ContactID int, Forename varchar(100), Surname varchar(100), Extn varchar(16), Email varchar(100), Age int );
INSERT INTO Sample_Table VALUES (1,'Bob','Smith','2295','bs#example.com',24);
INSERT INTO Sample_Table VALUES (2,'Alice','Brown','2255','ab#example.com',32);
INSERT INTO Sample_Table VALUES (3,'Reg','Jones','2280','rj#example.com',19);
INSERT INTO Sample_Table VALUES (4,'Mary','Doe','2216','md#example.com',28);
INSERT INTO Sample_Table VALUES (5,'Peter','Nash','2214','pn#example.com',25);
CREATE TABLE dbo.Sample_Table_Changes (ContactID int, FieldName sysname, FieldValueWas sql_variant, FieldValueIs sql_variant, modified datetime default (GETDATE()));
GO
-- -------------------- Create trigger --------------------
CREATE TRIGGER TriggerName ON dbo.Sample_Table FOR DELETE, INSERT, UPDATE AS
BEGIN
SET NOCOUNT ON;
--Unpivot deleted
WITH deleted_unpvt AS (
SELECT ContactID, FieldName, FieldValue
FROM
(SELECT ContactID
, cast(Forename as sql_variant) Forename
, cast(Surname as sql_variant) Surname
, cast(Extn as sql_variant) Extn
, cast(Email as sql_variant) Email
, cast(Age as sql_variant) Age
FROM deleted) p
UNPIVOT
(FieldValue FOR FieldName IN
(Forename, Surname, Extn, Email, Age)
) AS deleted_unpvt
),
--Unpivot inserted
inserted_unpvt AS (
SELECT ContactID, FieldName, FieldValue
FROM
(SELECT ContactID
, cast(Forename as sql_variant) Forename
, cast(Surname as sql_variant) Surname
, cast(Extn as sql_variant) Extn
, cast(Email as sql_variant) Email
, cast(Age as sql_variant) Age
FROM inserted) p
UNPIVOT
(FieldValue FOR FieldName IN
(Forename, Surname, Extn, Email, Age)
) AS inserted_unpvt
)
--Join them together and show what's changed
INSERT INTO Sample_Table_Changes (ContactID, FieldName, FieldValueWas, FieldValueIs)
SELECT Coalesce (D.ContactID, I.ContactID) ContactID
, Coalesce (D.FieldName, I.FieldName) FieldName
, D.FieldValue as FieldValueWas
, I.FieldValue AS FieldValueIs
FROM
deleted_unpvt d
FULL OUTER JOIN
inserted_unpvt i
on D.ContactID = I.ContactID
AND D.FieldName = I.FieldName
WHERE
D.FieldValue <> I.FieldValue --Changes
OR (D.FieldValue IS NOT NULL AND I.FieldValue IS NULL) -- Deletions
OR (D.FieldValue IS NULL AND I.FieldValue IS NOT NULL) -- Insertions
END
GO
-- -------------------- Try some changes --------------------
UPDATE Sample_Table SET age = age+1;
/*UPDATE Sample_Table SET Extn = '5'+Extn where Extn Like '221_';
DELETE FROM Sample_Table WHERE ContactID = 3;
INSERT INTO Sample_Table VALUES (6,'Stephen','Turner','2299','st#example.com',25);
UPDATE Sample_Table SET ContactID = 7 where ContactID = 4; --this will be shown as a delete and an insert
-- -------------------- See the results --------------------
SELECT *, SQL_VARIANT_PROPERTY(FieldValueWas, 'BaseType') FieldBaseType, SQL_VARIANT_PROPERTY(FieldValueWas, 'MaxLength') FieldMaxLength from Sample_Table_Changes;
-- -------------------- Cleanup --------------------
DROP TABLE dbo.Sample_Table; DROP TABLE dbo.Sample_Table_Changes;*/
select * from dbo.sample_table_changes

SQL Query to update parent record with child record values

I need to create a Trigger that fires when a child record (Codes) is added, updated or deleted. The Trigger stuffs a string of comma separated Code values from all child records (Codes) into a single field in the parent record (Projects) of the added, updated or deleted child record.
I am stuck on writing a correct query to retrieve the Code values from just those child records that are the children of a single parent record.
-- Create the test tables
CREATE TABLE projects (
ProjectId varchar(16) PRIMARY KEY,
ProjectName varchar(100),
Codestring nvarchar(100)
)
GO
CREATE TABLE prcodes (
CodeId varchar(16) PRIMARY KEY,
Code varchar (4),
ProjectId varchar(16)
)
GO
-- Add sample data to tables: Two projects records, one with 3 child records, the other with 2.
INSERT INTO projects
(ProjectId, ProjectName)
SELECT '101','Smith' UNION ALL
SELECT '102','Jones'
GO
INSERT INTO prcodes
(CodeId, Code, ProjectId)
SELECT 'A1','Blue', '101' UNION ALL
SELECT 'A2','Pink', '101' UNION ALL
SELECT 'A3','Gray', '101' UNION ALL
SELECT 'A4','Blue', '102' UNION ALL
SELECT 'A5','Gray', '102'
GO
I am stuck on how to create a correct Update query.
Can you help fix this query?
-- Partially working, but stuffs all values, not just values from chile (prcodes) records of parent (projects)
UPDATE proj
SET
proj.Codestring = (SELECT STUFF((SELECT ',' + prc.Code
FROM projects proj INNER JOIN prcodes prc ON proj.ProjectId = prc.ProjectId
ORDER BY 1 ASC FOR XML PATH('')),1, 1, ''))
The result I get for the Codestring field in Projects is:
ProjectId ProjectName Codestring
101 Smith Blue,Blue,Gray,Gray,Pink
...
But the result I need for the Codestring field in Projects is:
ProjectId ProjectName Codestring
101 Smith Blue,Pink,Gray
...
Here is my start on the Trigger. The Update query, above, will be added to this Trigger. Can you help me complete the Trigger creation query?
CREATE TRIGGER Update_Codestring ON prcodes
AFTER INSERT, UPDATE, DELETE
AS
WITH CTE AS (
select ProjectId from inserted
union
select ProjectId from deleted
)

The following trigger will perform as you want.
CREATE TRIGGER Update_Codestring ON prcodes
AFTER INSERT, UPDATE, DELETE
AS
UPDATE projects
SET Codestring = (SELECT STUFF((SELECT ',' + prc.Code
FROM projects proj INNER JOIN prcodes prc ON proj.ProjectId = prc.ProjectId
WHERE proj.ProjectId = projects.ProjectId
ORDER BY 1 ASC FOR XML PATH('')),1, 1, ''))
where ProjectId in (SELECT ProjectId FROM inserted
UNION
SELECT ProjectId FROM deleted)
What you were missing in your original update statement was:
WHERE proj.ProjectId = projects.ProjectId - This will filter the subquery to only the project that is being updated. projects with no alias comes from the update statement so as update is applied against each row in projects only the current project row being updated.
WHERE ProjectId IN (SELECT ProjectId FROM inserted UNION SELECT ProjectId FROM deleted) - This will filter the update to affect only the rows with changed children.
Also you can simplify the update statement since it doesn't need the projects table included twice:
CREATE TRIGGER Update_Codestring ON prcodes
AFTER INSERT, UPDATE, DELETE
AS
UPDATE projects
SET Codestring = (SELECT STUFF((SELECT ',' + prc.Code
FROM prcodes prc
WHERE prc.ProjectId = projects.ProjectId
ORDER BY 1 ASC FOR XML PATH('')),1, 1, ''))
WHERE ProjectId IN (SELECT ProjectId FROM inserted
UNION
SELECT ProjectId FROM deleted)
Finally do you really need to store the Codestring on your Projects table? It's something that can easily be recalculated in a query at anytime or even put into a view. That was you don't have to worry about having to store the extra data and have a trigger to maintain it.

SQL Merge with inserting into the third table

I want to create a merge that will compare two tables and insert not matched values into another third table or table variable
something like this:
MERGE Assets AS target
USING (#id, #name)FROM Sales AS source (id, name) ON (target.id = SOURCE.id)
WHEN MATCHED THEN
UPDATE SET target.Status = #status, target.DateModified = SYSUTCDATETIME()
WHEN NOT MATCHED THEN
INSERT INTO #tableVar (id, name, status, dateModified)
VALUES (#id, #name, #status, SYSUTCDATETIME())
Can this be done or are there other methods?

You just cannot do this. MERGE operates on two tables only - source and target.
For your requirement, you need to e.g. use a CTE (Common Table Expression) to find the rows that don't match - and insert those into the third table.
Something like:
;WITH NonMatchedData AS
(
-- adapt this as needed - just determine which rows match your criteria,
-- and make sure to return all the columns necessary for the subsequent INSERT
SELECT (columns)
FROM dbo.SourceTable
WHERE ID NOT IN (SELECT DISTINCT ID FROM dbo.TargetTable)
)
INSERT INTO dbo.ThirdTable(Col1, Col2, ....., ColN)
SELECT Col1, Col2, ....., ColN
FROM NonMatchedData

You CAN do this very easily...
You can wrap the MERGE statement within a INSERT INTO FROM:
http://technet.microsoft.com/en-us/library/bb510625.aspx#sectionToggle2
-OR-
You can do it directly within the merge statement:
Quick example:
WHEN NOT MATCHED THEN
DELETE
OUTPUT Deleted.* INTO dbo.MyTable;
This will insert the non-matches into your existing destination table. You can use the Updated, Inserted, Deleted v-tables to direct data other places.