Finding deltas between two tables

Finding deltas between two tables - sql-server

I have two tables, lets call them Table A and Table B. Both different sizes but have the same primary Key (ID). The variable field is (Name). I want to return rows from Table B that:
Data has changed
Data did not exist before
The returned data would have an additional column labelled comments with the value set to above each time the SQL executes. I have written the T-SQL below, however is there a better way to do this?
SELECT [ID]
,[Name]
,'Data did not exist before' AS [Comment]
FROM TABLENAMEB
WHERE [ID] NOT IN (SELECT [ID] FROM #TABLENAMEA)
UNION
SELECT B.[ID]
,B.[Name]
,'Data has changed' AS [Comment]
FROM TABLENAMEB B
LEFT JOIN TABLENAMEA A ON B.[ID] = A.[ID]
WHERE A.[Name] != B.[Name]

Something like this:
DECLARE #tblA TABLE(ID INT, Name VARCHAR(100));
INSERT INTO #tblA VALUES(1,'test1'),(2,'test2'),(4,'test4');
DECLARE #tblB TABLE(ID INT, Name VARCHAR(100));
INSERT INTO #tblB VALUES(2,'test2'),(3,'test3'),(4,'different');
SELECT CASE WHEN A.ID IS NULL THEN 'missing in A'
WHEN B.ID IS NULL THEN 'missing in B'
WHEN A.Name<>B.Name THEN 'different'
ELSE 'okay' END AS Comment
,*
FROM #tblA AS A
FULL OUTER JOIN #tblB AS B ON A.ID=B.ID
The result
Comment ID Name ID Name
missing in B 1 test1 NULL NULL
okay 2 test2 2 test2
different 4 test4 4 different
missing in A NULL NULL 3 test3

You could probably use left join and case to get the same result:
SELECT [ID]
,[Name]
,CASE
WHEN A.[Name] IS NULL THEN -- Assuming `Name` in table a is not nullable.
'Data did not exist before'
WHEN B.[Name] != A.[Name THEN
'Data has changed'
ELSE
''
END As [Comment]
FROM TABLENAMEB As B
LEFT JOIN #TABLENAMEA As A ON B.[ID] = A.[ID]

Edit: I just noticed you are using 2008. The features below will not be of any use.
You are writing this in T-SQL, thus you are using SQL Server. Look into either of these two features specifically designed to answer your question:
Change Data Capture
Temporal Tables

Related

Efficient query to filter for list of values across columns/distinct rows

SQL Server version is 2016+/Azure SqlDb (flexible if additive, compatible with both, forward-compatible).
Use case is API users sending a single-column list of values to filter some target table. The target table has 2-n columns whose values are unique across rows (maybe columns, doesn't matter) for the table/range being queried. (So far n <= 5, but that's a detail/not guaranteed.)
Here's a good-enough sample table DDL:
IF NOT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'SomeTable')
BEGIN
CREATE TABLE dbo.SomeTable (
ID int IDENTITY(1, 1) not null PRIMARY KEY CLUSTERED
, NaturalKey1 nvarchar(10) not null UNIQUE NONCLUSTERED
, NaturalKey2 nvarchar(10) not null UNIQUE NONCLUSTERED
, NaturalKey3 nvarchar(10) not null UNIQUE NONCLUSTERED
);
END
IF NOT EXISTS (SELECT 1 FROM dbo.SomeTable)
BEGIN
INSERT INTO dbo.SomeTable VALUES
('A', 'AA', 'ZZZZZ')
,('B', 'B', 'YYYYY')
,('C', 'CC', 'XXX')
,('D', 'DDD', 'WWWWW')
,('E', 'EEEE', 'V')
,('F', 'FF', 'UUUUUUUUU')
,('G', 'GGGGGGGG', 's')
-- lots more
;
END
SELECT * FROM dbo.SomeTable;
-- DROP TABLE dbo.SomeTable;
Assumptions are that all NaturalKey columns are of same type (probably nvarchar); filtering happens db-side; and in as few steps as possible, ideally one execution, in a stored procedure. Parameter will be string list or TVP, doesn't matter really. Result will include all data in any row of SomeTable that matches any value on any column. Target table is of unknown size.
Here's an example parameter for our pal above:
DECLARE #filterValues nvarchar(1000) = 'DDD,XXX,E,HH,ok,whatever,YYYYY';
SELECT * FROM string_split(#filterValues, ',');
I know a couple ways to do this, and can imagine several more, so it's not that kind of stuck. I'll bet someone knows a better trick than either of the two I'll illustrate.
Approach 1 Build a temp table updated for existence and join on it (concise and nice to audit, that's about it for pros)
DECLARE #filterValues nvarchar(1000) = 'DDD,XXX,E,HH,ok,whatever,YYYYY';
SELECT value AS InValue, CONVERT(int, null) AS IDMatch
INTO #filters
FROM string_split(#filterValues, ',');
UPDATE f
SET f.IDMatch = st.ID
FROM #filters AS f
INNER JOIN dbo.SomeTable AS st ON f.InValue IN (st.NaturalKey1, st.NaturalKey2, st.NaturalKey3);
SELECT * FROM #filters; -- Audit
SELECT st.* FROM #filters AS f INNER JOIN dbo.SomeTable AS st ON f.IDMatch = st.ID;
IF OBJECT_ID('tempdb..#filters') IS NOT NULL DROP TABLE #filters;
Approach 2 Unpivot SomeTable (I like the nifty cross apply trick) and just join (at scale there be ogres methinks)
SELECT
st.*
FROM
dbo.SomeTable AS st
CROSS APPLY (VALUES (st.NaturalKey1)
, (st.NaturalKey2)
, (st.NaturalKey3)
) AS nk(Val)
INNER JOIN #filters AS f ON nk.Val = f.InValue;
IF OBJECT_ID('tempdb..#filters') IS NOT NULL DROP TABLE #filters;
Is there a question in our future
Works is better than doesn't work, but looking for better/more efficient/more scalable methods from the T-SQL gurus. Unknown dimensions in columns and rows, response time is an SLA, filter size may or may not be capped. Bonus points if this ports neatly from SomeTable to SomeTableVersionN. (No dynamic SQL in an API.)
Could be dupe question, couldn't find it, pointing that out is just fine thank you.

SQL Server - How to enforce a field value from a combination of fields not repeated with another field value

I know I haven't framed the question very well, to be honest I found it difficult to explain without an example.
I have a table with SalesPersonID and SalesPersonSSN fields.
My requirement is a SalesPersonID should only exist with one SalesPersonSSN and vice versa.
If you see the table (sample data), the record with SalesPersonID 2003 is invalid because SalesPersonSSN 3344556677 already exists with SalesPersonID 2001. Similarly SalesPersonID 2001 should never exist with other than 3344556677.
I don't know how to enforce this rule in the table.
Also is there a simple query to find out if the rule is violated.

You want unique constraint :
alter table t
add constraint ssn unique(SalesPersonSSN);
If you want the data that violates the rules you can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t1.SalesPersonSSN = t.SalesPersonSSN and
t1.SalesPersonID <> t.SalesPersonID
);

To find out if your rule is violated you could use the follwowing
Table
DECLARE #t TABLE (SalesPersonId INT, SalesPersonSSN VARCHAR(255))
INSERT INTO #t VALUES (2001,'3344556677'), (2002,'7755330099'), (2003,'3344556677')
Query
SELECT t.*
FROM #t t
INNER JOIN (SELECT SalesPersonSSN
FROM #t
GROUP BY SalesPersonSSN
HAVING COUNT(*) > 1
) a
ON a.SalesPersonSSN = t.SalesPersonSSN

You need to write the complete logic for it:
declare #ssn as varchar(255)='7755330099' --INPUT
declare #pid as int=201 --INPUT
declare #ssn1 as varchar(255)--local variable
declare #pid1 as int --local variable
select #pid1=SalesPersonId from #t where SalesPersonSSN=#ssn;
select #ssn1=SalesPersonSSN from #t where SalesPersonId=#pid;
if(#pid1 is not null and #pid1<>#pid)
begin
print('failed: as person '+cast(#pid1 as varchar(255))+' already asigned to ssn#'+#ssn)
end
if(#ssn1 is not null and #ssn1<>#ssn)
begin
print('failed: as ssn#'+#ssn1 +' already asigned to pid# '+cast(#pid as varchar(255)))
end
Table definition:
create TABLE #t(SalesPersonId INT, SalesPersonSSN VARCHAR(255))
INSERT INTO #t VALUES (2001,'3344556677'), (2002,'7755330099'), (2003,'3344556677')

Consolidation of two data rows in a loop for n occurrences

We are running a table that holds some information for order of new products.
From time to time we receive new orders from a 3rd party system and insert them into our DB.
Sometimes, however, for a specific order there is already an entry in our table.
So instead of checking if there already IS an order, the colleagues just inserts new data sets into our table.
Now that the process of inserting is streamlined, I am supposed to consolidate the existing duplicates in the table.
The table looks like this:
I have 138 of these pairs where the PreOrderNumber occurrs twice. I'd like to insert the FK_VehicleFile number and the CommissionNumber to the row where the FK_Checklist is set and delete the duplicate with the missing FK_Checklist after that.
My idea is to write a transact script that looks like this:
First I store all the PreOrderNumbers that have duplicates in its an own table:
DECLARE #ResultSet TABLE ( PK_OrderNumber int,
FK_Checklist int,
FK_VehicleFile int,
PreOrderNumbers varchar(20))
INSERT INTO #ResultSet
SELECT PK_OrderNumber, PreOrderNumber
FROM [LUX_WEB_SAM].[dbo].[OrderNumbers]
GROUP BY PreOrderNumber
HAVING (COUNT(PreOrderNumber) > 1)
And that's it so far.
I'm very new to these kind of SQL scripts.
I think I need to use some kind of loop over all entries in the #ResultSet table to grab the FK_VehicleFile and CommissionNumber from the first data set and store them in the second data set.
Or do you have and suggestions how to solve this problem in a more easy way?

This response uses a CTE:
WITH [MergedOrders] AS
(
Select
ROW_NUMBER() OVER(PARTITION BY row1.PreOrderNumber ORDER BY row1.PK_OrderNumber) AS Instance
,row1.PK_OrderNumber AS PK_OrderNumber
,ISNULL(row1.FK_Checklist,row2.FK_Checklist) AS FK_Checklist
,ISNULL(row1.FK_VehicleFile,row2.FK_VehicleFile) AS FK_VehicleFile
,ISNULL(row1.PreOrderNumber,row2.PreOrderNumber) AS PreOrderNumber
,ISNULL(row1.CommissionNumber,row2.CommissionNumber) AS CommissionNumber
FROM [LUX_WEB_SAM].[dbo].[OrderNumbers] AS row1
INNER JOIN [LUX_WEB_SAM].[dbo].[OrderNumbers] AS row2
ON row1.PreOrderNumber = row2.PreOrderNumber
AND row1.PK_OrderNumber <> row2.PK_OrderNumber
)
SELECT
[PK_OrderNumber]
,[FK_Checklist]
,[FK_VehicleFile]
,[PreOrderNumber]
,[CommissionNumber]
FROM [MergedOrders]
WHERE Instance = 1 /* If we were to maintain Order Number of second instance, use 2 */
Here's the explanation:
A Common Table Expression (CTE) acts as an in-memory table, which we use to extract all rows that are repeated (NB: The INNER JOIN statement ensures that only rows that occur twice are selected). We use ISNULL to switch out values where one or the other is NULL, then select the output for our destination table.

You can take help from following scripts to perform your UPDATE and DELETE action.
Please keep in mind that both UPDATE and DELETE are risky operation and do your test first with test data.
CREATE TABLE #T(
Col1 VARCHAR(100),
Col2 VARCHAR(100),
Col3 VARCHAR(100),
Col4 VARCHAR(100),
Col5 VARCHAR(100)
)
INSERT INTO #T(Col1,Col2,Col3,Col4,Col5)
VALUES(30,NULL,222,00000002222,096),
(25,163,NULL,00000002222,NULL),
(30,163,NULL,00000002230,NULL)
SELECT * FROM #T
UPDATE A
SET A.Col3 = B.Col3, A.Col5 = B.Col5
FROM #T A
INNER JOIN #T B ON A.Col4 = B.Col4
WHERE A.Col2 IS NOT NULL AND B.Col2 IS NULL
DELETE FROM #T
WHERE Col4 IN (
SELECT Col4 FROM #T
GROUP BY Col4
HAVING COUNT(*) = 2
)
AND Col2 IS NULL
SELECT * FROM #T

Insert Into T2 all rows from T1 that are currently not in T2

I am trying to insert all records from T1 into T2 that are not currently in T2
I have tried in a loop as I am generating a code from a stored proc as the identifier of T2
declare #Part VARCHAR(255),
#GenValue VARCHAR(255),
#x INT
set #x = (select count(*) from T1)
WHILE #x >=0
BEGIN
EXEC [dbo].[usp_GenInd] #GenValue OUT,#GencCode = 'TKM', #GencIncrement = 1
set #Part = #GencValue
INSERT INTO dbo.T2
SELECT #Part AS [part],
[Prod_Code] + Column_Header AS [identifier],
[part_rev] = NULL,
'!' AS [u_version],
a.[Descr] AS [descr],
GETDATE() AS [last_updated],
'ME' AS [last_upd_user],
'EA' AS [basic_unit],
[source] = NULL,
'MAIN' AS [level_1],
'GROUP' AS [level_2],
'ME' AS [user_created],
'20' AS [status],
[Prod_Code] AS [master_part],
[drawing_no] = NULL
FROM [dbo].T1 a
LEFT JOIN dbo.T2 b
ON a.Prod_Code + a.Column_Header = b.part
WHERE b.part is null
END
I keep getting error saying primary key violation on T2 which is the #part variable I am generating from the stored proc.
really slow as well, I thought an insert on left join on null was quicker than a cursor.
only have 67 rows in T1
Thanks for helping in advance

Nope - go back to the cursor if you must continue to use this stored procedure to generate primary key values. The logic error you added to this script is the insert statement. It does not select a specific row from T1 - it selects all rows in T1 that do not exist in T2 (assuming that logic is correct - I'm not going to evaluate it). Presumably you must call the procedure usp_GenInd to generate a PK value for each row in T1. In addition, you never decrement #x - so you have an endless loop.
And notice the wording - "not exists". Generally I find it easier to understand undocumented logic when the query matches (as close as possible) the intent of the code. Your left join logic is the same as not exists - just more difficult to figure out. And you also have a potential problem with your concatenation logic to check for existence. 'AA' + 'B' = 'A' + 'AB' - but the columns contain different values. Be careful about assumptions.

I would try something like:
;WITH cte AS (
SELECT your needed data
FROM [dbo].T1
EXCEPT
SELECT already existing data
FROM [dbo].T2
)
INSERT INTO dbo.T2
SELECT *
FROM cte

Your JOIN logic is flawed.
In your INSERT you have this:
INSERT INTO dbo.T2
SELECT #Part AS [part],
[Prod_Code] + Column_Header AS [identifier],
Inserting #Part into [part]
But when you do your JOIN to rule out existing rows, you have this:
LEFT JOIN dbo.T2 b
ON a.Prod_Code + a.Column_Header = b.part
To rule out existing rows, you should be joining on #part=b.part.

SQL to join nvarchar(max) column with int column

I need some expert help to do left join on nvarchar(max) column with an int column. I have a Company table with EmpID as nvarchar(max) and this column holds multiple employee ID's separated with commas:
1221,2331,3441
I wanted to join this column with Employee table where EmpID is int.
I did something like below, But this doesn't work when I have 3 empId's or just 1 empID.
SELECT
A.*, B.empName AS empName1, D.empName AS empName2
FROM
[dbo].[Company] AS A
LEFT JOIN
[dbo].[Employee] AS B ON LEFT(A.empID, 4) = B.empID
LEFT JOIN
[dbo].[Employee] AS D ON RIGHT(A.empID, 4) = D.empID
My requirement is to get all empNames if there are multiple empID's in separate columns. Would highly appreciate any valuable input.

You should, if possible, normalize your database.
Read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutly yes!.
If, however, you can't change the database structure, you can use LIKE:
SELECT A.*, B.empName AS empName1, D.empName AS empName2
FROM [dbo].[Company] AS A
LEFT JOIN [dbo].[Employee] AS B ON ',' + A.empID + ',' LIKE '%,'+ B.empID + ',%'

You can give STRING_SPLIT a shot.
SQL Server (starting with 2016)
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql
CREATE TABLE #Test
(
RowID INT IDENTITY(1,1),
EID INT,
Names VARCHAR(50)
)
INSERT INTO #Test VALUES (1,'John')
INSERT INTO #Test VALUES (2,'James')
INSERT INTO #Test VALUES (3,'Justin')
INSERT INTO #Test VALUES (4,'Jose')
GO
CREATE TABLE #Test1
(
RowID INT IDENTITY(1,1),
ID VARCHAR(MAX)
)
INSERT INTO #Test1 VALUES ('1,2,3,4')
GO
SELECT Value,T.* FROM #Test1
CROSS APPLY STRING_SPLIT ( ID , ',' )
INNER JOIN #Test T ON value = EID

It sounds like you need a table to link employees to companies in a formal way. If you had that, this would be trivial. As it is, this is cumbersome and super slow. The below script creates that linkage for you. If you truly want to keep your current structure (bad idea), then the part you want is under the "insert into..." block.
--clean up the results of any prior runs of this test script
if object_id('STACKOVERFLOWTEST_CompanyEmployeeLink') is not null
drop table STACKOVERFLOWTEST_CompanyEmployeeLink;
if object_id('STACKOVERFLOWTEST_Employee') is not null
drop table STACKOVERFLOWTEST_Employee;
if object_id('STACKOVERFLOWTEST_Company') is not null
drop table STACKOVERFLOWTEST_Company;
go
--create two example tables
create table STACKOVERFLOWTEST_Company
(
ID int
,Name nvarchar(max)
,EmployeeIDs nvarchar(max)
,primary key(id)
)
create table STACKOVERFLOWTEST_Employee
(
ID int
,FirstName nvarchar(max)
,primary key(id)
)
--drop in some test data
insert into STACKOVERFLOWTEST_Company values(1,'ABC Corp','1,2,3,4,50')
insert into STACKOVERFLOWTEST_Company values(2,'XYZ Corp','4,5,6,7,8')--note that annie(#4) works for both places
insert into STACKOVERFLOWTEST_Employee values(1,'Bob') --bob works for abc corp
insert into STACKOVERFLOWTEST_Employee values(2,'Sue') --sue works for abc corp
insert into STACKOVERFLOWTEST_Employee values(3,'Bill') --bill works for abc corp
insert into STACKOVERFLOWTEST_Employee values(4,'Annie') --annie works for abc corp
insert into STACKOVERFLOWTEST_Employee values(5,'Matthew') --Matthew works for xyz corp
insert into STACKOVERFLOWTEST_Employee values(6,'Mark') --Mark works for xyz corp
insert into STACKOVERFLOWTEST_Employee values(7,'Luke') --Luke works for xyz corp
insert into STACKOVERFLOWTEST_Employee values(8,'John') --John works for xyz corp
insert into STACKOVERFLOWTEST_Employee values(50,'Pat') --Pat works for XYZ corp
--create a new table which is going to serve as a link between employees and their employer(s)
create table STACKOVERFLOWTEST_CompanyEmployeeLink
(
CompanyID int foreign key references STACKOVERFLOWTEST_Company(ID)
,EmployeeID INT foreign key references STACKOVERFLOWTEST_Employee(ID)
)
--this join looks for a match in the csv column.
--it is horrible and slow and unreliable and yucky, but it answers your original question.
--drop these messy matches into a clean temp table
--this is now a formal link between employees and their employer(s)
insert into STACKOVERFLOWTEST_CompanyEmployeeLink
select c.id,e.id
from
STACKOVERFLOWTEST_Company c
--find a match based on an employee id followed by a comma or preceded by a comma
--the comma is necessary so we don't accidentally match employee "5" on "50" or similar
inner join STACKOVERFLOWTEST_Employee e on
0 < charindex( convert(nvarchar(max),e.id) + ',',c.employeeids)
or 0 < charindex(',' + convert(nvarchar(max),e.id) ,c.employeeids)
order by
c.id, e.id
--show final results using the official linking table
select
co.Name as Employer
,emp.FirstName as Employee
from
STACKOVERFLOWTEST_Company co
inner join STACKOVERFLOWTEST_CompanyEmployeeLink link on link.CompanyID = co.id
inner join STACKOVERFLOWTEST_Employee emp on emp.id = link.EmployeeID

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Finding deltas between two tables - sql-server

Edit: I just noticed you are using 2008. The features below will not be of any use. You are writing this in T-SQL, thus you are using SQL Server. Look into either of these two features specifically designed to answer your question: Change Data Capture Temporal Tables

Related

Efficient query to filter for list of values across columns/distinct rows

SQL Server - How to enforce a field value from a combination of fields not repeated with another field value

Consolidation of two data rows in a loop for n occurrences

Insert Into T2 all rows from T1 that are currently not in T2

SQL to join nvarchar(max) column with int column

Categories

Resources