Updating a primary key with another column of scrambled unique values - sql-server

Disclaimer: this change is not generally a useful thing to do to a properly normalized database, but I have business reasons for it.
I have a table of data with a primary key of numeric values. This key is used as a foreign key reference in multiple other tables. There is also a column of numeric values that can be updated to reflect the desired order for the rows. The order and PK columns contain the same numbers, but ordering the table by either column scrambles the other one.
What I'm trying to do is to update the primary key to follow the same order as the order column, but SSMS gives me the error "Violation of PRIMARY KEY constraint 'PK_Constraint'. Cannot insert duplicate key in object 'tbl'. The duplicate key value is <value>."
My update statement looks like this:
update tbl set tbl.key = tbl.order where tbl.key <> tbl.order
I already know how to update the foreign key references in the other tables, so I just need to know how I can update the key in this situation.

Check to make sure that there are no duplicate values in tbl.Order. If there are, you must resolve the duplicates before you can update the PK column with those values.
SELECT
order,COUNT(order) as NumDupes
FROM tbl
GROUP BY order
HAVING COUNT(order) > 1

I eventually figured out enough of the issue that I could solve this using a cursor. I'm putting my solution here for reference. If someone wants to simplify/modify this to use set-based queries, I'll accept that answer.
Step 1
Using a query from this answer, I found that there were a few order/ID "chains" that had one end that would result in a duplicate with a simple set-based update:
with parents as
(
select 1 idx, ID, Order, Name from tbl where ID <> Order
union all
select idx+1, p.ID, v.Order, p.Name from parents p inner join tbl v on p.Order = v.ID and idx < 100
)
select parents from (
select distinct parents from (
select *, parents = stuff
( ( select ', ' + cast(p.Order as varchar(100)) from parents p
where t.ID = p.ID for xml path('')
) , 1, 2, '') from parents t ) x ) y
order by len(parents) desc
Step 2
I manually looked through the result set to find the longest row that ended with a given value. I then put the values from one chain into a temp table in the order given:
create table #tmp (id int identity(1,1), val int)
insert into #tmp values <list of values>
Step 3
Next I ran through the temp table with a cursor and updated each row (and foreign key references) individually:
declare #val int
declare #old int
declare val cursor for select val from #tmp order by id desc
open val
fetch next from val into #val
while ##fetch_status = 0
begin
set #old = (select ID from tbl where Order = #val)
insert into tbl(ID, <other columns>)
select #val, <other columns> from tbl where ID = #old
update <other tables> set FK_ID = #val where FK_ID = #old
delete from tbl where ID = #old
fetch next from val into #val
end;close val; deallocate val;
Step 4
I repeated steps 2 and 3 for each "chain". At the end, my table had the primary key in the same order as the Order field.

Related

Efficient query to filter for list of values across columns/distinct rows

SQL Server version is 2016+/Azure SqlDb (flexible if additive, compatible with both, forward-compatible).
Use case is API users sending a single-column list of values to filter some target table. The target table has 2-n columns whose values are unique across rows (maybe columns, doesn't matter) for the table/range being queried. (So far n <= 5, but that's a detail/not guaranteed.)
Here's a good-enough sample table DDL:
IF NOT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'SomeTable')
BEGIN
CREATE TABLE dbo.SomeTable (
ID int IDENTITY(1, 1) not null PRIMARY KEY CLUSTERED
, NaturalKey1 nvarchar(10) not null UNIQUE NONCLUSTERED
, NaturalKey2 nvarchar(10) not null UNIQUE NONCLUSTERED
, NaturalKey3 nvarchar(10) not null UNIQUE NONCLUSTERED
);
END
IF NOT EXISTS (SELECT 1 FROM dbo.SomeTable)
BEGIN
INSERT INTO dbo.SomeTable VALUES
('A', 'AA', 'ZZZZZ')
,('B', 'B', 'YYYYY')
,('C', 'CC', 'XXX')
,('D', 'DDD', 'WWWWW')
,('E', 'EEEE', 'V')
,('F', 'FF', 'UUUUUUUUU')
,('G', 'GGGGGGGG', 's')
-- lots more
;
END
SELECT * FROM dbo.SomeTable;
-- DROP TABLE dbo.SomeTable;
Assumptions are that all NaturalKey columns are of same type (probably nvarchar); filtering happens db-side; and in as few steps as possible, ideally one execution, in a stored procedure. Parameter will be string list or TVP, doesn't matter really. Result will include all data in any row of SomeTable that matches any value on any column. Target table is of unknown size.
Here's an example parameter for our pal above:
DECLARE #filterValues nvarchar(1000) = 'DDD,XXX,E,HH,ok,whatever,YYYYY';
SELECT * FROM string_split(#filterValues, ',');
I know a couple ways to do this, and can imagine several more, so it's not that kind of stuck. I'll bet someone knows a better trick than either of the two I'll illustrate.
Approach 1 Build a temp table updated for existence and join on it (concise and nice to audit, that's about it for pros)
DECLARE #filterValues nvarchar(1000) = 'DDD,XXX,E,HH,ok,whatever,YYYYY';
SELECT value AS InValue, CONVERT(int, null) AS IDMatch
INTO #filters
FROM string_split(#filterValues, ',');
UPDATE f
SET f.IDMatch = st.ID
FROM #filters AS f
INNER JOIN dbo.SomeTable AS st ON f.InValue IN (st.NaturalKey1, st.NaturalKey2, st.NaturalKey3);
SELECT * FROM #filters; -- Audit
SELECT st.* FROM #filters AS f INNER JOIN dbo.SomeTable AS st ON f.IDMatch = st.ID;
IF OBJECT_ID('tempdb..#filters') IS NOT NULL DROP TABLE #filters;
Approach 2 Unpivot SomeTable (I like the nifty cross apply trick) and just join (at scale there be ogres methinks)
SELECT
st.*
FROM
dbo.SomeTable AS st
CROSS APPLY (VALUES (st.NaturalKey1)
, (st.NaturalKey2)
, (st.NaturalKey3)
) AS nk(Val)
INNER JOIN #filters AS f ON nk.Val = f.InValue;
IF OBJECT_ID('tempdb..#filters') IS NOT NULL DROP TABLE #filters;
Is there a question in our future
Works is better than doesn't work, but looking for better/more efficient/more scalable methods from the T-SQL gurus. Unknown dimensions in columns and rows, response time is an SLA, filter size may or may not be capped. Bonus points if this ports neatly from SomeTable to SomeTableVersionN. (No dynamic SQL in an API.)
Could be dupe question, couldn't find it, pointing that out is just fine thank you.

T-SQL: Two Level Aggregation in Same Query

I have a query that joins a master and a detail table. Master table records are duplicated in results as expected. I get aggregation on detail table an it works fine. But I also need another aggregation on master table at the same time. But as master table is duplicated, aggregation results are duplicated too.
I want to demonstrate this situation as below;
If Object_Id('tempdb..#data') Is Not Null Drop Table #data
Create Table #data (Id int, GroupId int, Value int)
If Object_Id('tempdb..#groups') Is Not Null Drop Table #groups
Create Table #groups (Id int, Value int)
/* insert groups */
Insert #groups (Id, Value)
Values (1,100), (2,200), (3, 200)
/* insert data */
Insert #data (Id, GroupId, Value)
Values (1,1,10),
(2,1,20),
(3,2,50),
(4,2,60),
(5,2,70),
(6,3,90)
My select query is
Select Sum(data.Value) As Data_Value,
Sum(groups.Value) As Group_Value
From #data data
Inner Join #groups groups On groups.Id = data.GroupId
The result is;
Data_Value Group_Value
300 1000
Expected result is;
Data_Value Group_Value
300 500
Please note that, derived table or sub-query is not an option. Also Sum(Distinct groups.Value) is not suitable for my case.
If I am not wrong, you just want to sum value column of both table and show it in a single row. in that case you don't need to join those just select the sum as a column like :
SELECT (SELECT SUM(VALUE) AS Data_Value FROM #DATA),
(SELECT SUM(VALUE) AS Group_Value FROM #groups)
SELECT
(
Select Sum(d.Value) From #data d
WHERE EXISTS (SELECT 1 FROM #groups WHERE Id = d.GroupId )
) AS Data_Value
,(
SELECT Sum( g.Value) FROM #groups g
WHERE EXISTS (SELECT 1 FROM #data WHERE GroupId = g.Id)
) AS Group_Value
I'm not sure what you are looking for. But it seems like you want the value from one group and the collected value that represents a group in the data table.
In that case I would suggest something like this.
select Sum(t.Data_Value) as Data_Value, Sum(t.Group_Value) as Group_Value
from
(select Sum(data.Value) As Data_Value, groups.Value As Group_Value
from data
inner join groups on groups.Id = data.GroupId
group by groups.Id, groups.Value)
as t
The edit should do the trick for you.

How do I reference a table twice when creating an indexed view? Can I enforce uniqueness based on 2 tables and multiple rows without it?

EDIT: Added in sample data that I am trying to disallow.
This question is similiar to this: Cannot create a CLUSTERED INDEX on a View because I'm referencing the same table twice, any workaround? but the answer there doesn't help me. I'm trying to enforce uniqueness, and so an answer of "don't do that" without an alternative doesn't help me progress.
Problem Example (Simplified):
CREATE TABLE [dbo].[Object]
(
Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
OrgId UNIQUEIDENTIFIER
)
CREATE TABLE [dbo].[Attribute]
(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(256) NOT NULL
)
CREATE TABLE [dbo].[ObjectAttribute]
(
Id INT NOT NULL IDENTITY(1, 1),
ObjectId INT NOT NULL,
AttributeId INT NOT NULL,
Value NVARCHAR(MAX) NOT NULL,
CONSTRAINT FK_ObjectAttribute_Object FOREIGN KEY (ObjectId) REFERENCES [Object] (Id),
CONSTRAINT FK_ObjectAttribute_Attribute FOREIGN KEY (AttributeId) REFERENCES Attribute (Id)
)
GO
CREATE UNIQUE INDEX IUX_ObjectAttribute ON [dbo].[ObjectAttribute] ([ObjectId], [AttributeId])
GO
CREATE VIEW vObject_Uniqueness
WITH SCHEMABINDING
AS
SELECT
ObjectBase.OrgId
, CAST(OwnerValue.Value AS NVARCHAR(256)) AS OwnerValue
, CAST(NameValue.Value AS NVARCHAR(50)) AS NameValue
FROM [dbo].[Object] ObjectBase
INNER JOIN [dbo].ObjectAttribute OwnerValue
INNER JOIN [dbo].Attribute OwnerAttribute
ON OwnerAttribute.Id = OwnerValue.AttributeId
AND OwnerAttribute.Name = 'Owner'
ON OwnerValue.ObjectId = ObjectBase.Id
INNER JOIN [dbo].ObjectAttribute NameValue
INNER JOIN [dbo].Attribute NameAttribute
ON NameAttribute.Id = NameValue.AttributeId
AND NameAttribute.Name = 'Name'
ON NameValue.ObjectId = ObjectBase.Id
GO
/*
Cannot create index on view "[Database].dbo.vObject_Uniqueness". The view contains a self join on "[Database].dbo.ObjectAttribute".
*/
CREATE UNIQUE CLUSTERED INDEX IUX_vObject_Uniqueness
ON vObject_Uniqueness (OrgId, OwnerValue, NameValue)
GO
DECLARE #Org1 UNIQUEIDENTIFIER = NEWID();
DECLARE #Org2 UNIQUEIDENTIFIER = NEWID();
INSERT [dbo].[Object]
(
OrgId
)
VALUES
(#Org1) -- Id: 1
, (#Org2) -- Id: 2
, (#Org1) -- Id: 3
INSERT [dbo].[Attribute]
(
Name
)
VALUES
('Owner') -- Id: 1
, ('Name') -- Id: 2
--, ('Others')
-- Acceptable data.
INSERT [dbo].[ObjectAttribute]
(
AttributeId
, ObjectId
, Value
)
VALUES
(1, 1, 'Jeremy Pridemore') -- Owner for object 1 (Org1).
, (2, 1, 'Apple') -- Name for object 1 (Org1).
, (1, 2, 'John Doe') -- Owner for object 2 (Org2).
, (2, 2, 'Pear') -- Name for object 2 (Org2).
-- Unacceptable data.
-- Org1 already has an abject with an owner value of 'Jeremy' and a name of 'Apple'
INSERT [dbo].[ObjectAttribute]
(
AttributeId
, ObjectId
, Value
)
VALUES
(1, 3, 'Jeremy Pridemore') -- Owner for object 3 (Org1).
, (2, 3, 'Apple') -- Name for object 3 (Org1).
-- This is the bad data. I want to disallow this.
SELECT
OrgId, OwnerValue, NameValue
FROM vObject_Uniqueness
GROUP BY OrgId, OwnerValue, NameValue
HAVING COUNT(*) > 1
DROP VIEW vObject_Uniqueness
DROP TABLE ObjectAttribute
DROP TABLE Attribute
DROP TABLE [Object]
This example will create the error:
Msg 1947, Level 16, State 1, Line 2
Cannot create index on view "TestDb.dbo.vObject_Uniqueness". The view contains a self join on "TestDb.dbo.ObjectAttribute".
As this shows, I'm using an attribute system with 2 tables to represent one object and it's values. The existence of the object and the OrgId on an object are on the main table, and the rest of the values are attributes on the secondary table.
First of all, I don't understand why this says there is a self join. I'm joining from Object to ObjectAttribute twice. No where am I going from a table to that same table in an ON clause.
Second, is there a way to make this work? Or way to enforce the uniqueness that I'm going f or here? The end result that I want is that, by Object.OrgId, I have no two Object rows that have ObjectAttribute records referencing them providing the same 'Owner' and 'Name' values. So OrgId, Owner, and Name values need to be unique for any given Object.
I think you could create helper table for this:
CREATE TABLE [dbo].[ObjectAttributePivot]
(
Id int primary key,
OwnerValue nvarchar(256),
NameValue nvarchar(50)
)
GO
And then create helper trigger to keep data synchronized:
create view vw_ObjectAttributePivot
as
select
o.Id,
cast(ov.Value as nvarchar(256)) as OwnerValue,
cast(nv.Value as nvarchar(50)) as NameValue
from dbo.Object as o
inner join dbo.ObjectAttribute as ov on ov.ObjectId = o.Id
inner join dbo.Attribute as ova on ova.Id = ov.AttributeId and ova.Name = 'Owner'
inner join dbo.ObjectAttribute as nv on nv.ObjectId = o.Id
inner join dbo.Attribute as nva on nva.Id = nv.AttributeId and nva.Name = 'Name'
GO
create trigger utr_ObjectAttribute on ObjectAttribute
after update, delete, insert
as
begin
declare #temp_objects table (Id int primary key)
insert into #temp_objects
select distinct ObjectId from inserted
union
select distinct ObjectId from deleted
update ObjectAttributePivot set
OwnerValue = vo.OwnerValue,
NameValue = vo.NameValue
from ObjectAttributePivot as o
inner join vw_ObjectAttributePivot as vo on vo.Id = o.Id
where
o.Id in (select t.Id from #temp_objects as t)
insert into ObjectAttributePivot (Id, OwnerValue, NameValue)
select vo.Id, vo.OwnerValue, vo.NameValue
from vw_ObjectAttributePivot as vo
where
vo.Id in (select t.Id from #temp_objects as t) and
vo.Id not in (select t.Id from ObjectAttributePivot as t)
delete ObjectAttributePivot
from ObjectAttributePivot as o
where
o.Id in (select t.Id from #temp_objects as t) and
o.Id not in (select t.Id from vw_ObjectAttributePivot as t)
end
GO
After that, you can create unique view:
create view vObject_Uniqueness
with schemabinding
as
select
o.OrgId,
oap.OwnerValue,
oap.NameValue
from dbo.ObjectAttributePivot as oap
inner join dbo.Object as o on o.Id = oap.Id
GO
CREATE UNIQUE CLUSTERED INDEX IUX_vObject_Uniqueness
ON vObject_Uniqueness (OrgId, OwnerValue, NameValue)
GO
sql fiddle demo
The fundamental issue that we have here, enforcing the type of uniqueness you are going for, is in trying to answer the question, "When is it a violation?" Consider this:
Your database is loaded with the first two objects you reference in
your example (Org1 and Org2)
Now we INSERT ObjectAttribute(AttributeId, ObjectId, Value) VALUES (1, 3, 'Jeremy Pridemore')
Is this a violation? Based on what you have told me, I would say "no": we could go on to INSERT ObjectAttribute(AttributeId, ObjectId, Value) VALUES (2, 3, 'Cantalope'), and that would presumably be fine, right? So, we can't know whether the current statement is valid unless & until we know what the next statement is going to be. But there is no guarantee we will ever issue the second statement. Certainly there is no way of knowing what it will be at the time we are making up our minds whether the first statement is OK.
Should we, then, disallow free standing insertions of the type I am talking about-- where an "owner" entry is inserted, but with no simultaneous corrosponding "name" entry? To me, that is only workable approach to what you are trying to do here, and the only way to enforce that type of constraint is with a trigger.
Something like this:
DROP TRIGGER TR_ObjectAttribute_Insert
GO
CREATE TRIGGER TR_ObjectAttribute_Insert ON dbo.ObjectAttribute
AFTER INSERT
AS
DECLARE #objectsUnderConsideration TABLE (ObjectId INT PRIMARY KEY);
INSERT INTO #objectsUnderConsideration(ObjectId)
SELECT DISTINCT ObjectId FROM inserted;
DECLARE #expectedObjectAttributeEntries TABLE (ObjectId INT, AttributeId INT);
INSERT INTO #expectedObjectAttributeEntries(ObjectId, AttributeId)
SELECT o.ObjectId, a.Id AS AttributeId
FROM #objectsUnderConsideration o
CROSS JOIN Attribute a; -- cartisean join, objects * attributes
DECLARE #totalNumberOfAttributes INT = (SELECT COUNT(1) FROM Attribute);
-- ensure we got what we expect to get
DECLARE #expectedCount INT, #actualCount INT;
SET #expectedCount = (SELECT COUNT(*) FROM #expectedObjectAttributeEntries);
SET #actualCount = (
SELECT COUNT(*)
FROM #expectedObjectAttributeEntries e
INNER JOIN inserted i ON e.AttributeId = i.AttributeId AND e.ObjectId = i.ObjectId
); -- if an attribute is missing, we'll have too few; if an object is being entered twice, we'll have too many
IF #expectedCount < #actualCount
BEGIN
RAISERROR ('Invalid insertion: incomplete set of attribute values', 16, 1);
ROLLBACK TRANSACTION;
RETURN
END
ELSE IF #expectedCount > #actualCount
BEGIN
RAISERROR ('Invalid insertion: multiple entries for same object', 16, 1);
ROLLBACK TRANSACTION;
RETURN
END
-- passed the check that we have all the necessary attributes; now check for duplicates
ELSE
BEGIN
-- for each object, count exact duplicate preexisting entries; reject if every attribute is a dup
DECLARE #duplicateAttributeCount TABLE (ObjectId INT, DupCount INT);
INSERT INTO #duplicateAttributeCount(ObjectId, DupCount)
SELECT o.ObjectId, (
SELECT COUNT(1)
FROM inserted i
INNER JOIN ObjectAttribute oa
ON i.AttributeId = oa.AttributeId
AND i.ObjectId = oa.ObjectId
AND i.Value = oa.Value
AND i.Id <> oa.Id
WHERE oa.ObjectId = o.ObjectId
)
FROM #objectsUnderConsideration o
IF EXISTS (
SELECT 1
FROM #duplicateAttributeCount d
WHERE d.DupCount = #totalNumberOfAttributes
)
BEGIN
RAISERROR ('Invalid insertion: duplicates pre-existing entry', 16, 1);
ROLLBACK TRANSACTION;
RETURN
END
END
GO
The above is not tested; thinking about it, you may need to join out to Object and organize your tests by OrgId instead of ObjectId. You would also need comparable triggers for UPDATE and DELETE. But, hopefully this is at least enough to get you started.
You should consider which Sql Sever edition do you use, this has limitations on indexed views.
see: http://msdn.microsoft.com/en-us/library/cc645993(SQL.110).aspx#RDBMS_mgmt
See indexed views direct querying.
The following steps are required to create an indexed view and are critical to the successful implementation of the indexed view:
1-Verify the SET options are correct for all existing tables that will be referenced in the view.
2-Verify the SET options for the session are set correctly before creating any new tables and the view.
3-Verify the view definition is deterministic.
4-Create the view by using the WITH SCHEMABINDING option.
5-Create the unique clustered index on the view.
Required SET Options for Indexed Views
Evaluating the same expression can produce different results in the Database Engine if different SET options are active when the query is executed. For example, after the SET option CONCAT_NULL_YIELDS_NULL is set to ON, the expression 'abc ' + NULL returns the value NULL. However, after CONCAT_NULL_YIEDS_NULL is set to OFF, the same expression produces 'abc '.

Using merge..output to get mapping between source.id and target.id

Very simplified, I have two tables Source and Target.
declare #Source table (SourceID int identity(1,2), SourceName varchar(50))
declare #Target table (TargetID int identity(2,2), TargetName varchar(50))
insert into #Source values ('Row 1'), ('Row 2')
I would like to move all rows from #Source to #Target and know the TargetID for each SourceID because there are also the tables SourceChild and TargetChild that needs to be copied as well and I need to add the new TargetID into TargetChild.TargetID FK column.
There are a couple of solutions to this.
Use a while loop or cursors to insert one row (RBAR) to Target at a time and use scope_identity() to fill the FK of TargetChild.
Add a temp column to #Target and insert SourceID. You can then join that column to fetch the TargetID for the FK in TargetChild.
SET IDENTITY_INSERT OFF for #Target and handle assigning new values yourself. You get a range that you then use in TargetChild.TargetID.
I'm not all that fond of any of them. The one I used so far is cursors.
What I would really like to do is to use the output clause of the insert statement.
insert into #Target(TargetName)
output inserted.TargetID, S.SourceID
select SourceName
from #Source as S
But it is not possible
The multi-part identifier "S.SourceID" could not be bound.
But it is possible with a merge.
merge #Target as T
using #Source as S
on 0=1
when not matched then
insert (TargetName) values (SourceName)
output inserted.TargetID, S.SourceID;
Result
TargetID SourceID
----------- -----------
2 1
4 3
I want to know if you have used this? If you have any thoughts about the solution or see any problems with it? It works fine in simple scenarios but perhaps something ugly could happen when the query plan get really complicated due to a complicated source query. Worst scenario would be that the TargetID/SourceID pairs actually isn't a match.
MSDN has this to say about the from_table_name of the output clause.
Is a column prefix that specifies a table included in the FROM clause of a DELETE, UPDATE, or MERGE statement that is used to specify the rows to update or delete.
For some reason they don't say "rows to insert, update or delete" only "rows to update or delete".
Any thoughts are welcome and totally different solutions to the original problem is much appreciated.
In my opinion this is a great use of MERGE and output. I've used in several scenarios and haven't experienced any oddities to date.
For example, here is test setup that clones a Folder and all Files (identity) within it into a newly created Folder (guid).
DECLARE #FolderIndex TABLE (FolderId UNIQUEIDENTIFIER PRIMARY KEY, FolderName varchar(25));
INSERT INTO #FolderIndex
(FolderId, FolderName)
VALUES(newid(), 'OriginalFolder');
DECLARE #FileIndex TABLE (FileId int identity(1,1) PRIMARY KEY, FileName varchar(10));
INSERT INTO #FileIndex
(FileName)
VALUES('test.txt');
DECLARE #FileFolder TABLE (FolderId UNIQUEIDENTIFIER, FileId int, PRIMARY KEY(FolderId, FileId));
INSERT INTO #FileFolder
(FolderId, FileId)
SELECT FolderId,
FileId
FROM #FolderIndex
CROSS JOIN #FileIndex; -- just to illustrate
DECLARE #sFolder TABLE (FromFolderId UNIQUEIDENTIFIER, ToFolderId UNIQUEIDENTIFIER);
DECLARE #sFile TABLE (FromFileId int, ToFileId int);
-- copy Folder Structure
MERGE #FolderIndex fi
USING ( SELECT 1 [Dummy],
FolderId,
FolderName
FROM #FolderIndex [fi]
WHERE FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT
(FolderId, FolderName)
VALUES (newid(), 'copy_'+FolderName)
OUTPUT d.FolderId,
INSERTED.FolderId
INTO #sFolder (FromFolderId, toFolderId);
-- copy File structure
MERGE #FileIndex fi
USING ( SELECT 1 [Dummy],
fi.FileId,
fi.[FileName]
FROM #FileIndex fi
INNER
JOIN #FileFolder fm ON
fi.FileId = fm.FileId
INNER
JOIN #FolderIndex fo ON
fm.FolderId = fo.FolderId
WHERE fo.FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT ([FileName])
VALUES ([FileName])
OUTPUT d.FileId,
INSERTED.FileId
INTO #sFile (FromFileId, toFileId);
-- link new files to Folders
INSERT INTO #FileFolder (FileId, FolderId)
SELECT sfi.toFileId, sfo.toFolderId
FROM #FileFolder fm
INNER
JOIN #sFile sfi ON
fm.FileId = sfi.FromFileId
INNER
JOIN #sFolder sfo ON
fm.FolderId = sfo.FromFolderId
-- return
SELECT *
FROM #FileIndex fi
JOIN #FileFolder ff ON
fi.FileId = ff.FileId
JOIN #FolderIndex fo ON
ff.FolderId = fo.FolderId
I would like to add another example to add to #Nathan's example, as I found it somewhat confusing.
Mine uses real tables for the most part, and not temp tables.
I also got my inspiration from here: another example
-- Copy the FormSectionInstance
DECLARE #FormSectionInstanceTable TABLE(OldFormSectionInstanceId INT, NewFormSectionInstanceId INT)
;MERGE INTO [dbo].[FormSectionInstance]
USING
(
SELECT
fsi.FormSectionInstanceId [OldFormSectionInstanceId]
, #NewFormHeaderId [NewFormHeaderId]
, fsi.FormSectionId
, fsi.IsClone
, #UserId [NewCreatedByUserId]
, GETDATE() NewCreatedDate
, #UserId [NewUpdatedByUserId]
, GETDATE() NewUpdatedDate
FROM [dbo].[FormSectionInstance] fsi
WHERE fsi.[FormHeaderId] = #FormHeaderId
) tblSource ON 1=0 -- use always false condition
WHEN NOT MATCHED
THEN INSERT
( [FormHeaderId], FormSectionId, IsClone, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
VALUES( [NewFormHeaderId], FormSectionId, IsClone, NewCreatedByUserId, NewCreatedDate, NewUpdatedByUserId, NewUpdatedDate)
OUTPUT tblSource.[OldFormSectionInstanceId], INSERTED.FormSectionInstanceId
INTO #FormSectionInstanceTable(OldFormSectionInstanceId, NewFormSectionInstanceId);
-- Copy the FormDetail
INSERT INTO [dbo].[FormDetail]
(FormHeaderId, FormFieldId, FormSectionInstanceId, IsOther, Value, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
SELECT
#NewFormHeaderId, FormFieldId, fsit.NewFormSectionInstanceId, IsOther, Value, #UserId, CreatedDate, #UserId, UpdatedDate
FROM [dbo].[FormDetail] fd
INNER JOIN #FormSectionInstanceTable fsit ON fsit.OldFormSectionInstanceId = fd.FormSectionInstanceId
WHERE [FormHeaderId] = #FormHeaderId
Here's a solution that doesn't use MERGE (which I've had problems with many times I try to avoid if possible). It relies on two memory tables (you could use temp tables if you want) with IDENTITY columns that get matched, and importantly, using ORDER BY when doing the INSERT, and WHERE conditions that match between the two INSERTs... the first one holds the source IDs and the second one holds the target IDs.
-- Setup... We have a table that we need to know the old IDs and new IDs after copying.
-- We want to copy all of DocID=1
DECLARE #newDocID int = 99;
DECLARE #tbl table (RuleID int PRIMARY KEY NOT NULL IDENTITY(1, 1), DocID int, Val varchar(100));
INSERT INTO #tbl (DocID, Val) VALUES (1, 'RuleA-2'), (1, 'RuleA-1'), (2, 'RuleB-1'), (2, 'RuleB-2'), (3, 'RuleC-1'), (1, 'RuleA-3')
-- Create a break in IDENTITY values.. just to simulate more realistic data
INSERT INTO #tbl (Val) VALUES ('DeleteMe'), ('DeleteMe');
DELETE FROM #tbl WHERE Val = 'DeleteMe';
INSERT INTO #tbl (DocID, Val) VALUES (6, 'RuleE'), (7, 'RuleF');
SELECT * FROM #tbl t;
-- Declare TWO temp tables each with an IDENTITY - one will hold the RuleID of the items we are copying, other will hold the RuleID that we create
DECLARE #input table (RID int IDENTITY(1, 1), SourceRuleID int NOT NULL, Val varchar(100));
DECLARE #output table (RID int IDENTITY(1,1), TargetRuleID int NOT NULL, Val varchar(100));
-- Capture the IDs of the rows we will be copying by inserting them into the #input table
-- Important - we must specify the sort order - best thing is to use the IDENTITY of the source table (t.RuleID) that we are copying
INSERT INTO #input (SourceRuleID, Val) SELECT t.RuleID, t.Val FROM #tbl t WHERE t.DocID = 1 ORDER BY t.RuleID;
-- Copy the rows, and use the OUTPUT clause to capture the IDs of the inserted rows.
-- Important - we must use the same WHERE and ORDER BY clauses as above
INSERT INTO #tbl (DocID, Val)
OUTPUT Inserted.RuleID, Inserted.Val INTO #output(TargetRuleID, Val)
SELECT #newDocID, t.Val FROM #tbl t
WHERE t.DocID = 1
ORDER BY t.RuleID;
-- Now #input and #output should have the same # of rows, and the order of both inserts was the same, so the IDENTITY columns (RID) can be matched
-- Use this as the map from old-to-new when you are copying sub-table rows
-- Technically, #input and #output don't even need the 'Val' columns, just RID and RuleID - they were included here to prove that the rules matched
SELECT i.*, o.* FROM #output o
INNER JOIN #input i ON i.RID = o.RID
-- Confirm the matching worked
SELECT * FROM #tbl t

Table variable in SQL Server

I am using SQL Server 2005. I have heard that we can use a table variable to use instead of LEFT OUTER JOIN.
What I understand is that, we have to put all the values from the left table to the table variable, first. Then we have to UPDATE the table variable with the right table values. Then select from the table variable.
Has anyone come across this kind of approach? Could you please suggest a real time example (with query)?
I have not written any query for this. My question is - if someone has used a similar approach, I would like to know the scenario and how it is handled. I understand that in some cases it may be slower than the LEFT OUTER JOIN.
Please assume that we are dealing with tables that have less than 5000 records.
Thanks
It can be done, but I have no idea why you would ever want to do it.
This realy does seem like it is being done backwards. But if you are trying this for your own learning only, here goes:
DECLARE #MainTable TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #MainTable SELECT 1, 1
INSERT INTO #MainTable SELECT 2, 2
INSERT INTO #MainTable SELECT 3, 3
INSERT INTO #MainTable SELECT 4, 4
DECLARE #LeftTable TABLE(
ID INT,
MainID INT,
Val FLOAT
)
INSERT INTO #LeftTable SELECT 1, 1, 11
INSERT INTO #LeftTable SELECT 3, 3, 33
SELECT *,
mt.Val + ISNULL(lt.Val, 0)
FROM #MainTable mt LEFT JOIN
#LeftTable lt ON mt.ID = lt.MainID
DECLARE #Table TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #Table
SELECT ID,
Val
FROM #MainTable
UPDATE #Table
SET Val = t.Val + lt.Val
FROM #Table t INNER JOIN
#LeftTable lt ON t.ID = lt.ID
SELECT *
FROM #Table
I don't think it's very clear from your question what you want to achieve? (What your tables look like, and what result you want). But you can certainly select data into a variable of a table datatype, and tamper with it. It's quite convenient:
DECLARE #tbl TABLE (id INT IDENTITY(1,1), userId int, foreignId int)
INSERT INTO #tbl (userId)
SELECT id FROM users
WHERE name LIKE 'a%'
UPDATE #tbl t
SET
foreignId = (SELECT id FROM foreignTable f WHERE f.userId = t.userId)
In that example I gave the table variable an identity column of its own, distinct from the one in the source table. I often find that useful. Adjust as you like... Again, it's not very clear what the question is, but I hope this might guide you in the right direction...?
Every scenario is different, and without full details on a specific case it's difficult to say whether it would be a good approach for you.
Having said that, I would not be looking to use the table variable approach unless I had a specific functional reason to - if the query can be fulfilled with a standard SELECT query using an OUTER JOIN, then I'd use that as I'd expect that to be most efficient.
The times where you may want to use a temp table/table variable instead, are more when you want to get an intermediary resultset and then do some processing on it before then returning it out - i.e. the kind of processing that cannot be done with a straight forward query.
Note the table variables are very handy, but take into account that they are not guaranteed to reside in-memory - they can get persisted to tempdb like standard temp tables.
Thank you, astander.
I tried with an example given below. Both of the approaches took 19 seconds. However, I guess some tuning will help the Table varaible update approach to become faster than LEFT JOIN.
AS I am not a master in tuning I request your help. Any SQL expert ready to prove it?
---- PLease replace "" with '' below. I am not familiar with how to put code in this forum... It causes some troubles....
CREATE TABLE #MainTable (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(100)
)
DECLARE #Count INT
SET #Count = 0
DECLARE #Iterator INT
SET #Iterator = 0
WHILE #Count <8000
BEGIN
INSERT INTO #MainTable SELECT #Count, "Cust"+CONVERT(VARCHAR(10),#Count)
SET #Count = #Count+1
END
CREATE TABLE #RightTable
(
OrderID INT PRIMARY KEY,
CustomerID INT,
Product VARCHAR(100)
)
CREATE INDEX [IDX_CustomerID] ON #RightTable (CustomerID)
WHILE #Iterator <400000
BEGIN
IF #Iterator % 2 = 0
BEGIN
INSERT INTO #RightTable SELECT #Iterator,2, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
ELSE
BEGIN
INSERT INTO #RightTable SELECT #Iterator,1, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
SET #Iterator = #Iterator+1
END
-- Using LEFT JOIN
SELECT mt.CustomerID,mt.FirstName,COUNT(rt.Product) [CountResult]
FROM #MainTable mt
LEFT JOIN #RightTable rt ON mt.CustomerID = rt.CustomerID
GROUP BY mt.CustomerID,mt.FirstName
---------------------------
-- Using Table variable Update
DECLARE #WorkingTableVariable TABLE
(
CustomerID INT,
FirstName VARCHAR(100),
ProductCount INT
)
INSERT
INTO #WorkingTableVariable (CustomerID,FirstName)
SELECT CustomerID, FirstName FROM #MainTable
UPDATE #WorkingTableVariable
SET ProductCount = [Count]
FROM #WorkingTableVariable wt
INNER JOIN
(SELECT CustomerID,COUNT(rt.Product) AS [Count]
FROM #RightTable rt
GROUP BY CustomerID) IV ON wt.CustomerID = IV.CustomerID
SELECT CustomerID,FirstName, ISNULL(ProductCount,0) [CountResult] FROM #WorkingTableVariable
ORDER BY CustomerID
--------
DROP TABLE #MainTable
DROP TABLE #RightTable
Thanks
Lijo
In my opinion there is one reason to do this:
If you have a complicated query with lots of inner joins and one left join you sometimes get in trouble because this query is hundreds of times less fast than using the same query without the left join.
If you query lots of records with a result of very few records to be joined to the left join you could get faster results if you materialize the intermediate result into a table variable or temp table.
But usually there is no need to really update the data in the table variable - you could query the table variable using the left join to return the result.
... just my two cents.

Resources