Merging records in a table and update related tables - sql-server

I have a main table containing users that are linked to various other tables. Sometimes there are duplicates in this main table due to bad imported data and I would like to merge them. See the following tables.
Table: Users
UserID Username FirstName LastName
1 Main John Doe
2 Duplicate John Doo
Table: Records1
RecordID RecordName CreatedUserID UpdatedUserID
1 Test record 1 1 2
2 Test record 2 2 null
3 Test record 3 2 null
CreatedUserID and UpdatedUserID are foreign columns of Users.UserID.
So currently if I want to merge user 1 and 2, I would do it with these SQL statements:
UPDATE Records1 SET UpdatedUserID = 1 WHERE UpdatedUserID = 2
UPDATE Records1 SET CreatedUserID = 1 WHERE CreatedUserID = 2
DELETE FROM Users WHERE UserID = 2
This is just a sample subset but in reality, there are a LOT of related records tables for which I have to add additional SQL-Update statements.
I know I'm probably pushing my luck here, but is there perhaps a way to accomplish the above (update all related tables in a batch and delete the "duplicate" record) rather than updating each foreign field and each related table manually. The users table is basically the base table that links to all other tables so to create individual statements for each table is rather cumbersome so if a shortcut is available, that would be great.

is this helpful.?
Create Table Users(Id int, UserName varchar(10),FirstName varchar(10), LastName Varchar(10))
Create Table Records1(RecordID int, RecordName varchar(20), CreatedUserID int, UpdatedUserID int)
INSERT INTO Users
SELECT 1,'Main','John','Doe' Union All
SELECT 2,'Duplicate','John','Doo' Union All
SELECT 3,'Main3','ABC','MPN' Union All
SELECT 4,'Duplicate','ABC','MPT'
Insert into Records1
SELECT 1,'Test record 1',1,2 Union All
SELECT 2,'Test record 2',2,null Union All
SELECT 3,'Test record 3',2,null Union All
SELECT 1,'Test record 1',3,4 Union All
SELECT 2,'Test record 2',4,null Union All
SELECT 3,'Test record 3',4,null
Select u1.Id as CreatedUserID,U2.id as UpdatedUserID
Into #tmpUsers
from Users u1
JOIN Users u2
--This Conidition Should be changed based on the criteria for identifying Duplicates
on u1.FirstName=u2.FirstName and U2.UserName='Duplicate'
Where u1.UserName<>'Duplicate'
Update r
Set r.UpdatedUserID=u.CreatedUserID
From Records1 r
JOIN #tmpUsers u on r.CreatedUserID=u.CreatedUserID
Update r
Set r.CreatedUserID=u.CreatedUserID
From Records1 r
JOIN #tmpUsers u on r.CreatedUserID=u.UpdatedUserID
Delete from Users Where UserName='Duplicate'
Select * from Users
Select * from Records1
Drop Table #tmpUsers

Since the process of identifying duplicate accounts will be manual then there will (generally) be pairs of accounts to be processed. (I'm assuming that the Inspector can't tick off 15 user accounts as duplicates in your UI and submit the whole lot for processsing.)
A stored procedure like the following may be a good start:
create procedure MergeUsers
#RetainedUserId Int, -- UserId that is being kept.
#VictimUserId Int -- UserId that is to be removed.
as
begin
-- Validate the input.
-- Optional, but you may want some reality checks.
-- (Usernames are probably unique already, eh?)
declare #UsernameMatch as Int, #FirstNameMatch as Int, #LastNameMatch as Int, #EmailMatch as Int;
select
#UsernameMatch = case when R.Username = V.Username then 1 else 0 end,
#FirstNameMatch = case when R.FirstName = V.FirstName then 1 else 0 end,
#LastNameMatch = case when R.LastName = V.LastName then 1 else 0 end,
#EmailMatch = case when R.Email= V.Emailthen 1 else 0 end
from Users as R inner join
Users as V on V.UserId = #VictimUserId and R.UserId = #RetainedUserId;
if #UsernameMatch + #FirstNameMatch + #LastNameMatch + #EmailMatch < 2
begin
-- The following message should be enhanced to provide a better clue as to which user
-- accounts are being processed and what did or didn't match.
RaIsError( 'MergeUsers: The two user accounts should have something in common.', 25, 42 );
return;
end;
-- Update all of the related tables.
-- Using a single pass through each table and updating all of the appropriate columns may improve performance.
-- The case expression will only alter the values which reference the victim user account.
update Records1
set
CreatedUserId = case when CreatedUserId = #VictimId then #RetainedUserId else CreatedUserId end,
UpdatedUserId = case when UpdatedUserId = #VictimId then #RetainedUserId else UpdatedUserId end
where CreatedUserId = #VictimUserId or UpdatedUserId = #VictimUserId;
update Records2
set ...
where ...;
-- Houseclean Users .
delete from Users
where UserId = #VictimUserId;
end;
NB: Left as an exercise is adding try/catch and a transaction in the SP to ensure that the merge is an all or nothing operation.

Related

SQL- use an attribute to group activities and use the group as parameter

I have a table that looks like this:
ActivityID
Time Used
Activity Type
Activity Category ID
Activity Category
123456
30
A
1
X
765432
120
B
2
Y
876462
65
C
3
Z
h52635
76
D
3
Z
hsgs62
187
E
1
X
I would like to use the Activity Category as parameter (#ActivityCategory) to filter my report later, it means the filter should be X;Y;Z.
When I choose one Activity Category, the sum of "Time used" should appear.
My question is: how should I build the query, to be able to group the activities with the same Activity Category together and use the Category XYZ as a parameter?
Something like this perhaps:
-- Sample data
DECLARE #table TABLE (ActivityId INT, TimeUsed INT, ActivityCategory CHAR(1));
INSERT #table VALUES(123,20,'X'), (129,50,'Y'), (254,30,'Y'), (991,10,'Z');
-- Parameter
DECLARE #ActivityCategory VARCHAR(100) = 'X,Y';
SELECT t.ActivityCategory, TimeUsed = SUM(t.TimeUsed)
FROM #table AS t
CROSS APPLY STRING_SPLIT(#ActivityCategory,',') AS s -- You will need a string splitter funciton
WHERE t.ActivityCategory = s.value
GROUP BY t.ActivityCategory;
Returns:
ActivityCategory TimeUsed
---------------- -----------
X 20
Y 80
Alan's answer is good, but I'd personally use a temp table and a join for performance reasons. The table being queried might be very large, in which case a join to a temp table would be more performant than CROSS APPLY.
The easiest way to pass multi-value parameters in and out of your query are comma-separated lists. Indeed if you are using Report Server / SSRS then that is how the "Multiple Value" box in the user interface will deliver the users' selections into a varchar parameter.
--Declare and set parameter
DECLARE #ActivityCategories varchar(MAX)
SET #ActivityCategories = 'X,Y,Z'
--Convert individual parameter values to a temp table
DROP TABLE IF EXISTS #ParamaterValues
CREATE TABLE #ParameterValues (ActivityCategory varchar(10) NOT NULL PRIMARY KEY CLUSTERED)
INSERT INTO #ParameterValues WITH(TABLOCK)
SELECT value
FROM STRING_SPLIT(#ActivityCategories,',')
GROUP BY value
ORDER BY value
--Join on temp table to filter by paramater values
SELECT ActivityID,
TimeUsed,
ActivityType,
ActivityCategoryID,
ActivityCategory
FROM dbo.YourTable a
INNER JOIN #ParameterValues b ON a.ActivityCategory = b.ActivityCategory

SQL Server copy rows to second table

I have a table for bookings (table_b) that has around 1.3M rows. A second table (table_s) is used to note when these rows are needed to be accessed by a separate application.
Currently there are triggers to make a record in table_s but this doesn't help with all existing data.
I believe I need to have a query that selects the rows that exists in table_b but not table_s and then insert a row for each line.
Here is my current syntax but don't think it has been formed correctly
DECLARE #b_id [INT] = 0;
WHILE(1 = 1)
BEGIN
SELECT TOP 10
#b_id = MIN([b].[b_id])
FROM
[table_b] AS [b]
LEFT JOIN
[table_s] AS [s] ON [b].[b_id] = [s].[b_id]
WHERE
[s].[b_id] IS NULL;
IF #b_id IS NULL
BREAK;
INSERT INTO [table_s] ([b_id], [processed])
VALUES (#b_id, 0);
END;
Syntactically everything is fine. But there are some misconceptions present in your query
select top 10 #b_id = MIN(b.b_id)
a variable can hold just one value, even though you select top 10 it will assign single value to variable. Your current approach will loop for each non existing record
I don't think for 1 million records insert we need to split the insert into batches. Try this way
INSERT INTO table_s
(b_id,
processed)
SELECT b_id,
0
FROM table_b AS b
WHERE NOT EXISTS (SELECT 1
FROM table_s AS s
WHERE b.b_id = s.b_id)

SQL Server 2008: Unique constraint for values non-related with columns

I have a simple problem. How can I add a unique constraint for a table, without relating the values to their columns? For example, I have this table
ID_A ID_B
----------
1 2
... ...
In that example, I have the record (1,2). For me, (1,2) = (2,1). So i don't want to allow my database to store both values. I know I can accomplish it using, triggers or checks and functions. But i was wondering if there is any instruccion like
CREATE UNIQUE CONSTRAINT AS A SET_CONSTRAINT
You could write a view like that:
select 1 as Dummy
from T t1
join T t2 on t1.ID1 = t2.ID2 AND t1.ID2 = t2.ID1 --join to corresponding row
cross join TwoRows
And create a unique index on Dummy. TwoRows is a table that contains two rows with arbitrary contents. It is supposed to make the unique index fail if there ever is a row in it. Any row in this view indicates a uniqueness violation.
You can do this using Instead of Insert trigger.
Demo
Table Schema
CREATE TABLE te(ID_A INT,ID_B INT)
INSERT te VALUES ( 1,2)
Trigger
Go
CREATE TRIGGER trg_name
ON te
instead OF INSERT
AS
BEGIN
IF EXISTS (SELECT 1
FROM inserted a
WHERE EXISTS (SELECT 1
FROM te b
WHERE ( ( a.id_a = b.id_b
AND a.id_b = b.id_a )
OR ( a.id_a = b.id_a
AND a.id_b = b.id_b ) )))
BEGIN
PRINT 'duplciate record'
ROLLBACK
END
ELSE
INSERT INTO te
SELECT Id_a,id_b
FROM inserted
END
SELECT * FROM te
Insert Script
INSERT INTO te VALUES (2,1) -- Duplicate
INSERT INTO te VALUES (1,2) --Duplicate
INSERT INTO te VALUES (3,2) --Will work

Sum of 2 values in a table

I have 2 Tables like below
Table - 1
Bank_Name
Bank_ACNO
Bank_Branch
Bank_Balance
Table - 2
Emp_ID
Amount_Paid
Table-1 contains unique records for each Bank ACNO. But Table 2 contain Multiple records. Now i want to update Table - 1 (Bank_Balance) With Sum(Table-1.Bank_Balance + Amount_Paid) where Table-1.Bank_ACNO=Table-2.Emp_ID.
I tried the below Query which did not Work.
UPDATE Bank_Master
SET Bank_Balance = ( Bank_Master.Bank_Balance
+ Order_Archieve_Temp.Amount_Paid )
OUTER JOIN Order_Archieve_Temp
ON Bank_Balance.Bank_ACNO=Order_Archieve_Temp.Emp_ID)
Here is the SQLFiddel Demo
Below is the Update Query which you can try :
Update T1
set T1.Bank_Balance = t1.Bank_Balance + t2.Amount_Paid
FROM TABLE1 T1,
(select Emp_ID,sum(Amount_Paid) as Amount_Paid
from Table2
group by Emp_ID ) as T2
WHERE T1.Bank_ACNO = T2.Emp_ID
If that's going to remain your table design, you better keep your database under really tight control: in most such circumstances, applications that have to determine a balance will do so by calculating it on-the-fly from some known and well-controlled state (say, from the last statement date) as a sum of that balance, and all the transactions that have occurred after then.
The current design appears vulnerable to miscalculation of the balance, and continued persistence of that error into the future.
Are there any possible concurrency issues here (could multiple parties possibly be executing this same statement from different connections?). What is your transaction isolation level?
Try this query:
BEGIN TRAN;
UPDATE t1
SET Bank_Balance = t1.Bank_Balance + ISNULL(x.Total_Amount_Paid,0)
-- or
-- SET Bank_Balance = ISNULL(t1.Bank_Balance,0) + ISNULL(x.Total_Amount_Paid,0)
-- or
-- SET Bank_Balance = NULLIF(ISNULL(t1.Bank_Balance,0) + ISNULL(x.Total_Amount_Paid,0), 0)
FROM dbo.Table1 t1
OUTER APPLY
(
SELECT SUM(t2.Amount_Paid) AS Total_Amount_Paid
FROM dbo.Table2 t2
WHERE t1.Bank_ACNO = t2.Emp_ID
) x
ROLLBACK
-- COMMIT

T-SQL Grouping Sets of Information

I have a problem which my limited SQL knowledge is keeping me from understanding.
First the problem:
I have a database which I need to run a report on, it contains configurations of a users entitlements. The report needs to show a distinct list of these configurations and a count against each one.
So a line in my DB looks like this:
USER_ID SALE_ITEM_ID SALE_ITEM_NAME PRODUCT_NAME CURRENT_LINK_NUM PRICE_SHEET_ID
37715 547 CultFREE CultPlus 0 561
the above line is one row of a users configuration, for every user ID there can be 1-5 of these lines. So the definition of a configuration is multiple rows of data sharing a common User ID with variable attributes..
I need to get a distinct list of these configurations across the whole table, leaving me just one configuration set for every instance where > 1 has that configuration and a count of instances of that configuration.
Hope this is clear?
Any ideas?!?!
I have tried various group by's and unions, also the grouping sets function to no avail.
Will be very greatful if anyone can give me some pointers!
Ouch that hurt ...
Ok so problem:
a row represents a configurable line
users may be linked to more than 1 row of configuration
configuration rows when grouped together form a configuration set
we want to figure out all of the distinct configuration sets
we want to know what users are using them.
Solution (its a bit messy but the idea is there, copy and paste in to SQL management studio) ...
-- ok so i imported the data to a table named SampleData ...
-- 1. import the data
-- 2. add a new column
-- 3. select all the values of the config in to the new column (Configuration_id)
--UPDATE [dbo].[SampleData]
--SET [Configuration_ID] = SALE_ITEM_ID + SALE_ITEM_NAME + [PRODUCT_NAME] + [CURRENT_LINK_NUM] + [PRICE_SHEET_ID] + [Configuration_ID]
-- 4. i then selected just the distinct values of those and found 6 distinct Configuration_id's
--SELECT DISTINCT [Configuration_ID] FROM [dbo].[SampleData]
-- 5. to make them a bit easier to read and work with i gave them int values instead
-- for me it was easy to do this manually but you might wanna do some trickery here to autonumber them or something
-- basic idea is to run the step 4 statement but select into a new table then add a new primary key column and set identity spec on it
-- that will generate u a bunch of incremental numbers for your config id's so u can then do something like ...
--UPDATE [dbo].[SampleData] sd
--SET Configuration_ID = (SELECT ID FROM TempConfigTable WHERE Config_ID = sd.Configuration_ID)
-- at this point you have all your existing rows with a unique ident for the values combined in each row.
-- so for example in my dataset i have several rows where only the user_id has changed but all look like this ...
--SALE_ITEM_ID SALE_ITEM_NAME PRODUCT_NAME CURRENT_LINK_NUM PRICE_SHEET_ID Configuration_ID
--54101 TravelFREE TravelPlus 0 56101 1
-- now you have a config id you can start to work on building sets up ...
-- each user is now matched with 1 or more config id
-- 6. we use a CTE (common table expression) to link the possibles (keeps the join small) ...
--WITH Temp (ConfigID)
--AS
--(
-- SELECT DISTINCT SD.Configuration_Id --SD2.Configuration_Id, SD3.Configuration_Id, SD4.Configuration_Id, SD5.Configuration_Id,
-- FROM [dbo].[SampleData] SD
--)
-- this extracts all the possible combinations using the CTE
-- on the basis of what you told me, max rows per user is 6, in the result set i have i only have 5 distinct configs
-- meaning i gain nothing by doing a 6th join.
-- cross joins basically give you every combination of unique values from the 2 tables but we joined back on the same table
-- so its every possible combination of Temp + Temp (ConfigID + ConfigID) ... per cross join so with 5 joins its every combination of
-- Temp + Temp + Temp + Temp + Temp .. good job temp only has 1 column with 5 values in it
-- 7. uncomment both this and the CTE above ... need to use them together
--SELECT DISTINCT T.ConfigID C1, T2.ConfigID C2, T3.ConfigID C3, T4.ConfigID C4, T5.ConfigID C5
--INTO [SETS]
--FROM Temp T
--CROSS JOIN Temp T2
--CROSS JOIN Temp T3
--CROSS JOIN Temp T4
--CROSS JOIN Temp T5
-- notice the INTO clause ... this dumps me out a new [SETS] table in my db
-- if i go add a primary key to this and set its ident spec i now have unique set id's
-- for each row in the table.
--SELECT *
--FROM [dbo].[SETS]
-- now here's where it gets interesting ... row 1 defines a set as being config id 1 and nothing else
-- row 2 defines set 2 as being config 1 and config 2 and nothing else ... and so on ...
-- the problem here of course is that 1,2,1,1,1 is technically the same set as 1,1,1,2,1 from our point of view
-- ok lets assign a set to each userid ...
-- 8. first we pull the distinct id's out ...
--SELECT DISTINCT USER_ID usr, null SetID
--INTO UserSets
--FROM SampleData
-- now we need to do bit a of operating on these that's a bit much for a single update or select so ...
-- 9. process findings in a loop
DECLARE #currentUser int
DECLARE #set int
-- while theres a userid not linked to a set
WHILE EXISTS(#currentUser = SELECT TOP 1 usr FROM UserSets WHERE SetId IS NULL)
BEGIN
-- figure out a set to link it to
SET #set = (
SELECT TOP 1 ID
FROM [SETS]
-- shouldn't really do this ... basically need to refactor in to a table variable then compare to that
-- that way the table lookup on ur main data is only 1 per User_id
WHERE C1 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C2 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C3 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C4 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C5 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
)
-- hopefully that worked
IF(#set IS NOT NULL)
BEGIN
-- tell the usersets table
UPDATE UserSets SET SetId = #set WHERE usr = #currentUser
set #set = null
END
ELSE -- something went wrong ... set to 0 to prevent endless loop but any userid linked to set 0 is a problem u need to look at
UPDATE UserSets SET SetId = 0 WHERE usr = #currentUser
-- and round we go again ... until we are done
END
SELECT
USER_ID,
SALE_ITEM_ID, ETC...,
COUNT(*) WhateverYouWantToNameCount
FROM TableNAme
GROUP BY USER_ID

Resources