T-SQL: Best way to copy hierarchy data? - sql-server

My database looks like this:
Questionnaire
Id
Description
Category
id
description
QuestionnaireId (FK)
Question
id
CategoryId (FK)
field
When I copy a questionnaire, I'd like to copy all the underlying tables. So this means that the table Questionnaire gets a new Id. Then, all the belonging categories of the questionnaire must also be copied. So the newly inserted categories must get the new questionnaire Id. After the categories, the questions must be copied. But the categoryId must be updated to the newly inserted category.
How can I do this using t-sql?

This is pretty easy to accomplish, but you have to keep track of everything as you go. I would generally create a single SP for this, which takes as an input the questionnaire to copy.
DECLARE #newQuestionnaireId INT
INSERT INTO Questionnaire
(Id,Description)
SELECT Id, Description
FROM Questionnaire
WHERE ID = #sourceQuestionnaireID
SET #newquestionnaireId = SCOPE_IDENTITY()
At this point you have a new header record, and the newly generated Id for the copy. The next step is to load the categories into a temp table which has an extra field for the new Id
DECLARE #tempCategories TABLE (id INT, description VARCHAR(50),newId INT)
INSERT INTO #tempCategories(id,description)
SELECT id, description FROM Category
WHERE questionnaireId = #sourceQuestionnaireId
Now, you have a temp table with all the categories to insert, along with a field to backfill the new ID for this category. Use a cursor to go over the list inserting the new record, and use a similar SCOPE_IDENTITY call to backfill the new Id.
DECLARE cuCategory CURSOR FOR SELECT Id, Description FROM #tempCategories
DECLARE #catId INT, #catDescription, #newCatId INT
OPEN cuCategory
FETCH NEXT FROM cuCategory INTO #catId,#catDescription
WHILE ##FETCH_STATUS<>0
BEGIN
INSERT INTO Category(description,questionnaireId)
VALUES(#catDescription,#newQuestionnaireId)
SET #newCatId = SCOPE_IDENTITY()
UPDATE #tempCategories SET newCatId=#newCatId
WHERE id=#catId
FETCH NEXT FROM cuCategory INTO #catId,#catDescription
END
CLOSE cuCategory
DEALLOCATE cuCategory
At this point you now have a temp table which maps the catId from the original questionnaire to the catId for the new questionnaire. This can be used to fill the final table in much the same way - which i'll leave as an excercise for you, but feel free to post back here if you have difficulty.
Finally, I would suggest that this whole operation is carried out within a transaction to save you from half completed copies when something goes wrong.
A couple of disclaimers: The above was all typed quickly, dont expect it to work off the bat. Second, Ive assumed that all your PK's are identity fields, which they should be! If they're not just replace the SCOPE_IDENTITY() calls with the appropriate logic to generate the next ID.
Edit: documentation for Cursor operations can be foundhere

I had a problem like this and began to implement the solution suggested by #Jamiec but I quickly realised that I needed a better solution because my model is much larger than that in the example cited here. I have one master table with three intermediate tables, each of which have one or more tertiary tables. And the three intermediates each had something like 50 columns. This would mean a lot of work to type all that up, particularly in the fetch part with the temporary memvars. I tried to find a way to FETCH directly into the temp table but it seems you cannot do that.
What I did was add a column to the intermediate tables called OriginalId. Here is my code translated into the model used by the asker:
DECLARE #newQuestionnaireId INT
INSERT INTO Questionnaire (Id,Description)
SELECT Id, Description FROM Questionnaire
WHERE ID = #sourceQuestionnaireID
SET #newquestionnaireId = SCOPE_IDENTITY()
INSERT INTO Category(QuestionnaireId, description, originalId)
SELECT #newquestionnaireId, description, id FROM Category
WHERE questionnaireId = #sourceQuestionnaireId
INSERT INTO Question SELECT Category.Id, Question.Field
FROM Question join Category on Question.CategoryId = Category.OriginalId
WHERE Category.QuestionnaireId = #newquestionnaireId
In my model the id fields are all Identities so you do not supply them in the inserts.
Another thing I discovered before I gave up on the CURSOR approach was this clever little trick to avoid having to type the FETCH statement twice by using an infinite WHILE loop with a BREAK:

here is a way that does not have cursors, it relies on remembering the order of events, and then using that to resolve the children.
Declare #Parrent TABLE( ID int PRIMARY KEY IDENTITY, Value nvarchar(50))
Declare #Child TABLE( ID int PRIMARY KEY IDENTITY, ParrentID int, Value nvarchar(50))
insert into #Parrent (Value) Values ('foo'),('bar'),('bob')
insert into #Child (ParrentID, Value) Values (1,'foo-1'),(1,'foo-2'),(2,'bar-1'),(2,'bar-2'),(3,'bob')
declare #parrentToCopy table (ID int) -- you can me this a collection
insert into #parrentToCopy values (2)
select * from #Parrent p inner join #Child c on p.ID = c.ParrentID order by p.ID asc, c.ID asc
DECLARE #Ids TABLE( nID INT);
INSERT INTO #Parrent (Value)
OUTPUT INSERTED.ID
INTO #Ids
SELECT
Value
FROM #Parrent p
inner join #parrentToCopy pc on pc.ID=p.ID
ORDER BY p.ID ASC
INSERT INTO #Child (ParrentID, Value)
SELECT
nID
,Value
FROM #Child c
inner join (select ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS 'RowNumber' from #parrentToCopy) o ON o.ID = c.ParrentID
inner join (select nID, ROW_NUMBER() OVER (ORDER BY nID ASC) AS 'RowNumber' from #Ids) n ON o.RowNumber = n.RowNumber
select * from #Parrent p inner join #Child c on p.ID = c.ParrentID order by p.ID asc, c.ID asc
full post is here http://bashamer.wordpress.com/2011/10/04/copying-hierarchical-data-in-sql-server/

Related

TSQL Incrementing Count of Variable

I have a UI that allows a user to select one or more fields they want to add to a table. This data also has an orderID associated with it that determines the field order.
When the user adds new fields, I need to find the last orderID this user used and increment it by 1, submitting all of the new fields.
For example, if there is a single record that already exists in the database, it would have an orderID of 1. When I choose to add three more fields, it would check to see the last orderID I used (1) and then increment it for each of the new records it adds, 1-4.
-- Get the last ID orderID for this user and increment it by 1 as our starting point
DECLARE #lastID INT = (SELECT TOP 1 orderID FROM dbo.BS_ContentRequests_Tasks_User_Fields WHERE QID = #QID ORDER BY orderID DESC)
SET #lastID = #lastID+1;
-- Create a temp table to hold our fields that we are adding
DECLARE #temp AS TABLE (fieldID int, orderID int)
-- Insert our fields and incremented numbers
INSERT INTO #temp( fieldID, orderID )
SELECT ParamValues.x1.value('selected[1]', 'int'),
#lastID++
FROM #xml.nodes('/root/data/fields/field') AS ParamValues(x1);
Obviously the #lastID++ part is where my issue is but hopefully it helps to understand what I am trying to do.
What other method could be used to handle this?
ROW_NUMBER() ought to do it.
select x.Value,
ROW_NUMBER() over (order by x.Value) + #lastID
from (
select 10 ParamValues.x1.value('selected[1]', 'int') Value
from #xml.nodes('/root/data/fields/field') AS ParamValues(x1)
) x
You could use a column with IDENTITY(1,1)
If you want OrderID to be unique across the entire table then see below:
Click here to take a look at another post that addresses this issue.
There are multiple ways to approach this issue, but in this case, the easiest, while reasonable, means may be to use an identity column. However, that is not as extensible as using a sequence. If you feel that you may need more flexibility in the future, then use a sequence.
If you want OrderID to be unique across the fields inserted in one batch then see below:
You should take a closer look at Chris Steele's answer.

How to create a copy of data when the data has a hierarchy

I have a table structure where there are FK columns in child tables.
So say there is the following:
Company
-company_id
-name
Location
-location_id
-company_name
-name
Store
-store_id
-location_id
-name
Inventory
-inventory_id
-store_id
Now I want to create a copy of a company, along with all of location, store and inventory rows.
So say I want to create a copy of company_id=123, I have to duplicate all the rows.
I tried this:
DECLARE #OriginalCompanyId = 123
DECALRE #companyId AS INT
INSERT Companies (name)
select c.name
from companies c
where c.companyId = #OrignalCOmpanyId
SET #companyId = SCOPE_IDENTITY()
But this approach won't work because the other tables have multiple rows and I won't be able to linkup the newly inserted PK values.
What approach should I be taking?
I've actually been working on a project that does just this. My solution while not fancy, has so far proven effective.. the annoying part being the setup process. I am very open to critique and suggestions for improvement.
Create a "mirror" schema/db of all the necessary tables (I've gone with New[ApplicationTableName])
For each pKey/fKey, create a "placeholder" column (I've gone with p[ColumnName])
Map the existing data to placeholder keys, indexed at 1. (This is annoying, but doable with ranking functions.)
Insert into the application by your placeholder keys in descending order (descending is important!)
Update your "mirror" table using ranking functions (see example)
Repeat as necessary using the drived/inserted values across however many tables you need.
Example:
Given this schema...
CREATE TABLE Accounts (
AccountID int identity(1,1) not null,
Name varchar(500) not null
)
CREATE TABLE Users(
UserID int identity(1,1) not null,
AccountID int not null,
Name varchar(500) not null
)
CREATE TABLE NewUsers(
pUserID int not null,
UserID int not null,
AccountID int not null,
Name varchar(500)
)
And this data
INSERT INTO NewUsers VALUES
(1,0,0,'Bob'),
(2,0,0,'Sally'),
(3,0,0,'Jeff'),
(4,0,0,'Sam')
Say for each time we "create" an account we want to create these 4 default users... This will look something like this
DECLARE #AccountID int --this is scalar, so we'll use scope_identity() to grab it.
INSERT INTO Account VALUES('MyNewAccountID')
SELECT #AccountID = SCOPE_IDENTITY()
--Prepare NewUsers w/ derived accountID
UPDATE NewUsers SET AccountID = #AccountID
--Do our "application" insert
INSERT INTO Users(AccountID,Name)
SELECT AccountID,Name
FROM NewUsers
ORDER BY pUserID DESC;
--Capture inserted ID's for use in other tables (where we've derived pUserID)
WITH InsertedUsers AS(
SELECT
--use dense rank, it handles fkey mappings too
DENSE_RANK() OVER(ORDER BY UserID DESC) as pUserID,
UserID
FROM Users
)
UPDATE NewUsers SET UserID = iu.UserID
FROM NewUsers nu
JOIN InsertedUsers iu
ON iu.pUserID = nu.pUserID
SELECT TOP 100 * FROM Account ORDER BY 1 DESC
SELECT TOP 100 * FROM Users ORDER BY 1 DESC
So now if a future table needs UserID into the app (and has a derived pUserID,) we can grab it from NewUsers by joining on pUserID.

Copy complex table using stored procedure

I writing a stored procedure to copy rows in a table.
This is the table
I want to copy this but the ParentId should be linked to the new row.
If i do a simple INSERT INTO > SELECT FROM the ParentId will be linked to the ProductId 22 not the new ProductId as you can see above.
Any suggestion?
Your question is not completely clear, but if I understand it correctly, you are trying to copy several rows that build a hierarchy while preserving that hierarchy.
This cannot be done in one step. You need to first copy the rows and record the new and their matching old ids. Then you can update the references in the new rows to point to the new parents.
The simplest way to do this is using the MERGE statement:
CREATE TABLE dbo.tst(id INT IDENTITY(1,1), parent_id INT, other INT);
INSERT INTO dbo.tst(parent_id, other)VALUES(NULL,1);
INSERT INTO dbo.tst(parent_id, other)VALUES(1,2);
INSERT INTO dbo.tst(parent_id, other)VALUES(1,3);
INSERT INTO dbo.tst(parent_id, other)VALUES(3,4);
INSERT INTO dbo.tst(parent_id, other)VALUES(NULL,5);
INSERT INTO dbo.tst(parent_id, other)VALUES(5,6);
CREATE TABLE #tmp(old_id INT, new_id INT);
MERGE dbo.tst AS trg
USING dbo.tst AS src
ON (0=1)
WHEN NOT MATCHED
AND (src.id >= 1) --here you can put your own WHERE clause.
THEN
INSERT(parent_id, other)
VALUES(src.parent_id, src.other)
OUTPUT src.id, INSERTED.id INTO #tmp(old_id, new_id);
UPDATE trg SET
parent_id = tmp_translate.new_id
FROM dbo.tst AS trg
JOIN #tmp AS tmp_filter
ON trg.id = tmp_filter.new_id
JOIN #tmp AS tmp_translate
ON trg.parent_id = tmp_translate.old_id;
SELECT * FROM dbo.tst;
The line with the comment is the place where you can put your own where clause to select the rows that you want to copy. make sure to actually copy all referenced parents. If you copy a child without its parent the update will not catch it and it will point to the old parent in the end.
You also should wrap the MERGE and the UPDATE in a transaction to prevent someone else from reading the new and not yet finished records.
You can use SELECT to do this, you just need to manually specify the order of the columns. Here is an example, assuming that your table is called Product and that the ProductId is auto incremented. Notice that the first column returned in the SELECT is the primary key of the old row.
INSERT dbo.Product
SELECT
ProductId,
ArtNo,
[Description]
Specification,
Unit,
Account,
NetPrice
OhTime
FROM dbo.Product AS P
WHERE P.ParentId = 22
Does that help?

Schema Change: Convert field to foreign key on a different table

I currently have a table like this:
Stuff
----------
StuffId identity int not null
Description nvarchar(4000) null
...
I want to store the Description in a separate table that I have set aside specifically for user-generated content:
Content
----------
ContentId identity int not null
Content nvarchar(max) not null
...
(this table already exists, and other tables already reference entries in it.)
So I need to:
Create a DescriptionContentId field on the Stuff table with a foreign key constraint.
Copy the current Description content into the Content table.
Set each DescriptionContentId to have the ContentId value that was automatically generated when inserting values in step 2.
Drop the Description column.
I know how to do steps 1 and 4, but steps 2 and 3 are eluding me, because they need to be done pretty much simultaneously. This seems like it would be a fairly common schema change. What's the best way to do it?
Update
I'm a step closer thanks to the Output keyword, but I'm still missing something. Here's what I'd like to do:
create table #tmp (StuffId int, ContentId int)
insert into Content(Content)
output s.StuffId, inserted.ContentId
into #tmp(StuffId, ContentId)
select Description
from Stuff s
where Description IS NOT NULL
But I can't reference s.StuffId because it isn't one of the fields inserted into the Content table. How can I correlate the ID of the Stuff with the ID of the Content as I'm inserting a new Content item for each Stuff entry?
The output clause will come to your rescue.
It will output the description and the identity column from the insert into a table varaible and then you can use that data to update the other table.
If description is not unique, you may have to do the following:
add a column for the stuffID column to the content table. Then output the stuffid and content id from the insert, update the table using the stuffid to ensure uniqueness, Drop the stuffid column from the content table.
an example from Books Online as to how to use the OUTPUT
DECLARE #MyTableVar table(
LastName nvarchar(20) NOT NULL,
FirstName nvarchar(20) NOT NULL,
CurrentSales money NOT NULL
);
INSERT INTO dbo.EmployeeSales (LastName, FirstName, CurrentSales)
OUTPUT INSERTED.LastName,
INSERTED.FirstName,
INSERTED.CurrentSales
INTO #MyTableVar
SELECT c.LastName, c.FirstName, sp.SalesYTD
FROM HumanResources.Employee AS e
INNER JOIN Sales.SalesPerson AS sp
ON e.EmployeeID = sp.SalesPersonID
INNER JOIN Person.Contact AS c
ON e.ContactID = c.ContactID
WHERE e.EmployeeID LIKE '2%'
ORDER BY c.LastName, c.FirstName;

What columns can be used in OUTPUT INTO clause?

I'm trying to build a mapping table to associate the IDs of new rows in a table with those that they're copied from. The OUTPUT INTO clause seems perfect for that, but it doesn't seem to behave according to the documentation.
My code:
DECLARE #Missing TABLE (SrcContentID INT PRIMARY KEY )
INSERT INTO #Missing
( SrcContentID )
SELECT cshadow.ContentID
FROM Private.Content AS cshadow
LEFT JOIN Private.Content AS cglobal ON cshadow.Tag = cglobal.Tag
WHERE cglobal.ContentID IS NULL
PRINT 'Adding new content headers'
DECLARE #Inserted TABLE (SrcContentID INT PRIMARY KEY, TgtContentID INT )
INSERT INTO Private.Content
( Tag, Description, ContentDate, DateActivate, DateDeactivate, SortOrder, CreatedOn, IsDeleted, ContentClassCode, ContentGroupID, OrgUnitID )
OUTPUT cglobal.ContentID, INSERTED.ContentID INTO #Inserted (SrcContentID, TgtContentID)
SELECT Tag, Description, ContentDate, DateActivate, DateDeactivate, SortOrder, CreatedOn, IsDeleted, ContentClassCode, ContentGroupID, NULL
FROM Private.Content AS cglobal
INNER JOIN #Missing AS m ON cglobal.ContentID = m.SrcContentID
Results in the error message:
Msg 207, Level 16, State 1, Line 34
Invalid column name 'SrcContentID'.
(line 34 being the one with the OUTPUT INTO)
Experimentation suggests that only rows that are actually present in the target of the INSERT can be selected in the OUTPUT INTO. But this contradicts the docs in the books online. The article on OUTPUT Clause has example E that describes a similar usage:
The OUTPUT INTO clause returns values
from the table being updated
(WorkOrder) and also from the Product
table. The Product table is used in
the FROM clause to specify the rows to
update.
Has anyone worked with this feature?
(In the meantime I've rewritten my code to do the job using a cursor loop, but that's ugly and I'm still curious)
You can do this with a MERGE in Sql Server 2008. Example code below:
--drop table A
create table A (a int primary key identity(1, 1))
insert into A default values
insert into A default values
delete from A where a>=3
-- insert two values into A and get the new primary keys
MERGE a USING (SELECT a FROM A) AS B(a)
ON (1 = 0) -- ignore the values, NOT MATCHED will always be true
WHEN NOT MATCHED THEN INSERT DEFAULT VALUES -- always insert here for this example
OUTPUT $action, inserted.*, deleted.*, B.a; -- show the new primary key and source data
Result is
INSERT, 3, NULL, 1
INSERT, 4, NULL, 2
i.e. for each row the new primary key (3, 4) and the old one (1, 2). Creating a table called e.g. #OUTPUT and adding " INTO #OUTPUT;" at the end of the OUTPUT clause would save the records.
I've verified that the problem is that you can only use INSERTED columns. The documentation seems to indicate that you can use from_table_name, but I can't seem to get it to work (The multi-part identifier "m.ContentID" could not be bound.):
TRUNCATE TABLE main
SELECT *
FROM incoming
SELECT *
FROM main
DECLARE #Missing TABLE (ContentID INT PRIMARY KEY)
INSERT INTO #Missing(ContentID)
SELECT incoming.ContentID
FROM incoming
LEFT JOIN main
ON main.ContentID = incoming.ContentID
WHERE main.ContentID IS NULL
SELECT *
FROM #Missing
DECLARE #Inserted TABLE (ContentID INT PRIMARY KEY, [Content] varchar(50))
INSERT INTO main(ContentID, [Content])
OUTPUT INSERTED.ContentID /* incoming doesn't work, m doesn't work */, INSERTED.[Content] INTO #Inserted (ContentID, [Content])
SELECT incoming.ContentID, incoming.[Content]
FROM incoming
INNER JOIN #Missing AS m
ON m.ContentID = incoming.ContentID
SELECT *
FROM #Inserted
SELECT *
FROM incoming
SELECT *
FROM main
Apparently the from_table_name prefix is only allowed on DELETE or UPDATE (or MERGE in 2008) - I'm not sure why:
from_table_name
Is a column prefix that specifies a table included in the FROM clause of a DELETE or UPDATE statement that is used to specify the rows to update or delete.
If the table being modified is also specified in the FROM clause, any reference to columns in that table must be qualified with the INSERTED or DELETED prefix.
I'm running into EXACTLY the same problem as you are, I feel your pain...
As far as I've been able to find out there's no way to use the from_table_name prefix with an INSERT statement.
I'm sure there's a viable technical reason for this, and I'd love to know exactly what it is.
Ok, found it, here's a forum post on why it doesn't work:
MSDN forums
I think I found a solution to this problem, it sadly involves a temporary table, but at least it'll prevent the creation of a dreaded cursor :)
What you need to do is add an extra column to the table you're duplicating records from and give it a 'uniqueidentifer' type.
then declare a temporary table:
DECLARE #tmptable TABLE (uniqueid uniqueidentifier, original_id int, new_id int)
insert the the data into your temp table like this:
insert into #tmptable
(uniqueid,original_id,new_id)
select NewId(),id,0 from OriginalTable
the go ahead and do the real insert into the original table:
insert into OriginalTable
(uniqueid)
select uniqueid from #tmptable
Now to add the newly created identity values to your temp table:
update #tmptable
set new_id = o.id
from OriginalTable o inner join #tmptable tmp on tmp.uniqueid = o.uniqueid
Now you have a lookup table that holds the new id and original id in one record, for your using pleasure :)
I hope this helps somebody...
(MS) If the table being modified is also specified in the FROM clause, any reference to columns in that table must be qualified with the INSERTED or DELETED prefix.
In your example, you can't use cglobal table in the OUTPUT unless it's INSERTED.column_name or DELETED.column_name:
INSERT INTO Private.Content
(Tag)
OUTPUT cglobal.ContentID, INSERTED.ContentID
INTO #Inserted (SrcContentID, TgtContentID)
SELECT Tag
FROM Private.Content AS cglobal
INNER JOIN #Missing AS m ON cglobal.ContentID = m.SrcContentID
What worked for me was a simple alias table, like this:
INSERT INTO con1
(Tag)
OUTPUT **con2**.ContentID, INSERTED.ContentID
INTO #Inserted (SrcContentID, TgtContentID)
SELECT Tag
FROM Private.Content con1
**INNER JOIN Private.Content con2 ON con1.id=con2.id**
INNER JOIN #Missing AS m ON con1.ContentID = m.SrcContentID

Resources