Finding difference between 2 tables in MS Access or SQL Server - sql-server

I have 2 Excel files which I imported into MS Access as two tables. These two tables are identical but imported on different dates.
Now, how can I find out what rows and what fields are updated on the later date? Any help would be highly appreciated.

Finding Inserted records is easy
select * from B where not exists (select 1 from A where A.pk=B.pk)
Finding Deleted records is just as easy
select * from A where not exists (select 1 from B where A.pk=B.pk)
Finding Updated records is a pain. The following rigorous query assumes you have nullable columns and it should work in all situations.
select B.*
from B
inner join A on B.pk=A.pk
where A.col1<>B.col1 or (IsNull(A.col1) and not IsNull(B.col1)) or (not IsNull(A.col1) and IsNull(B.col1))
or A.col2<>B.col2 or (IsNull(A.col2) and not IsNull(B.col2)) or (not IsNull(A.col2) and IsNull(B.col2))
or A.col3<>B.col3 or (IsNull(A.col3) and not IsNull(B.col3)) or (not IsNull(A.col3) and IsNull(B.col3))
etc...
If the columns are defined as NOT NULL then the query is much simper, just remove all the NULL tests.
If the columns are nullable but you can identify a value that will never appear in the data, then use a simple comparison like:
Nz(A.col1,neverAppearingValue)<>Nz(B.col1,neverAppearingValue)

I believe this should be as simple as running a query like this:
SELECT *
FROM Table1
JOIN Table2
ON Table1.ID = Table2.ID AND Table1.Date != Table2.Date

One way to do this is by unpivoting both tables, so you get a new table with , , . Note, though, that you have to take types into account.
For example, the following gets differences in fields:
with oldt as (select id, col, val
from <old table> t
unpivot (val for col in (<column list>)) unpvt
),
newt as (select id, col, val
from <new table> t
unpivot (val for col in (<column list>)) unpvit
)
select *
from oldt full outer join newt on oldt.id = newt.id
where oldt.id is null or newt.id is null
The alternative way with a join is rather cumbersome. This version shows whether columns are added, deleted, and which columns changed if any:
select *
from (select coalesce(oldt.id, newt.id) as id,
(case when oldt.id is null and newt.id is not null then 'ADDED'
when oldt.id is not null and newt.id is null then 'DELETED'
else 'SAME'
end) as stat,
(case when oldt.col1 <> newt.col1 or oldt.col1 is null and newt.col1 is null
then 1 else 0 end) as diff_col1,
(case when oldt.col2 <> newt.col2 or oldt.col2 is null and newt.col2 is null
then 1 else 0 end) as diff_col2,
...
from <old table> oldt full outer join <new table> newt on oldt.id = newt.id
) c
where status in ('ADDED', 'DELETED') or
(diff_col1 + diff_col2 + ... ) > 0
It does have the advantage of working for any data types.

(Select * from OldTable Except Select *from NewTable)
Union All
(Select * from NewTable Except Select *from OldTable)

Related

How to properly join when comparing the difference between two tables pertaining to specific fields

I'm having problems writing this query. So I'm comparing a temp table to a table within our database. I want to find any records that don't have the same Case_Number the same Id_Number combination between the two table. The query I am using only provides me one or the other depending on how I join them. If I join by Case_Number, it returns the Case_Number records that do not match between the two tables. If I join by Id_Number, it will return the Id_Numbers that do not match between the two tables.. Is there a way to join by both Case_Number and ID_Number so that the query returns both? I would also like to know if it would be possible for me to include an "If Exist" to the query? Code below
SELECT T1.Case_Number, T1.Id_Number, T1.FirstDate, T1.LastDate, T2.Case_Number, T2.Id_Number, T2.FirstDate, T2.LastDate
FROM dbo.table T
inner join #TempTable T2
on T1.Id_Number = T2.Id_Number
--on T1.Case_Number = T2.Case_Number
where T1.LastDate is null
and T1.Case_Number <> T2.Case_Number
OR T1.Id_number <> T2.Id_Number
Use or in inner join statement
SELECT T1.Case_Number,
T1.Id_Number,
T1.FirstDate,
T1.LastDate,
T2.Case_Number,
T2.Id_Number,
T2.FirstDate,
T2.LastDate
FROM dbo.table T
INNER JOIN #TempTable T2 ON (T1.Id_Number = T2.Id_Number
OR T1.Case_Number = T2.Case_Number)
WHERE T1.LastDate IS NULL
AND T1.Case_Number <> T2.Case_Number
OR T1.Id_number <> T2.Id_Number
This should find both - firsst colum tells you what is the case
(
SELECT 'in Table not in Temp' as R,
T1.Case_Number as 'T1.Case_Number',
T1.Id_Number as 'T1.Id_Number',
T1.FirstDate as 'T1.FirstDate',
T1.LastDate as 'T1.LastDate',
null as 'T2.Case_Number',
null as 'T2.Id_Number',
null as 'T2.FirstDate',
null as 'T2.LastDate'
FROM dbo.table T1
where not exists (
select 1
from #TempTable T2
where T1.Case_Number == T2.CaseNumber
and T1.Id_Number == T2.Id_Number)
)
UNION ALL
(
SELECT 'in temp, not in list' as R
null as 'T1.Case_Number',
null as 'T1.Id_Number',
null as 'T1.FirstDate',
null as 'T1.LastDate',
T2.Case_Number as 'T2.Case_Number',
T2.Id_Number as 'T2.Id_Number',
T2.FirstDate as 'T2.FirstDate',
T2.LastDate as 'T2.LastDate'
FROM #TempTable T2
where not exists (
select 1
from dbo.table T1
where T1.Case_Number == T2.CaseNumber
and T1.Id_Number == T2.Id_Number )
)
Add the T1.LastDate is null - not sure where you want it.
You probably need to tweak the column names , I switched T1/T2 for 2nd statement so column will be the same table all the time, but names have duplicates.

Update records SQL?

First when I started this project seemed very simple. Two tables, field tbl1_USERMASTERID in Table 1 should be update from field tbl2_USERMASTERID Table 2. After I looked deeply in Table 2, there is no unique ID that I can use as a key to join these two tables. Only way to match the records from Table 1 and Table 2 is based on FIRST_NAME, LAST_NAME AND DOB. So I have to find records in Table 1 where:
tbl1_FIRST_NAME equals tbl2_FIRST_NAME
AND
tbl1_LAST_NAME equals tbl2_LAST_NAME
AND
tbl1_DOB equals tbl2_DOB
and then update USERMASTERID field. I was afraid that this can cause some duplicates and some users will end up with USERMASTERID that does not belong to them. So if I find more than one record based on first,last name and dob those records would not be updated. I would like just to skip and leave them blank. That way I wouldn't populate invalid USERMASTERID. I'm not sure what is the best way to approach this problem, should I use SQL or ColdFusion (my server side language)? Also how to detect more than one matching record?
Here is what I have so far:
UPDATE Table1 AS tbl1
LEFT OUTER JOIN Table2 AS tbl2
ON tbl1.dob = tbl2.dob
AND tbl1.fname = tbl2.fname
AND tbl1.lname = tbl2.lname
SET tbl1.usermasterid = tbl2.usermasterid
WHERE LTRIM(RTRIM(tbl1.usermasterid)) = ''
Here is query where I tried to detect duplicates:
SELECT DISTINCT
tbl1.FName,
tbl1.LName,
tbl1.dob,
COUNT(*) AS count
FROM Table1 AS tbl1
LEFT OUTER JOIN Table2 AS tbl2
ON tbl1.dob = tbl2.dob
AND tbl1.FName = tbl2.first
AND tbl1.LName = tbl2.last
WHERE LTRIM(RTRIM(tbl1.usermasterid)) = ''
AND LTRIM(RTRIM(tbl1.first)) <> ''
AND LTRIM(RTRIM(tbl1.last)) <> ''
AND LTRIM(RTRIM(tbl1.dob)) <> ''
GROUP BY tbl1.FName,tbl1.LName,tbl1.dob
Some data after I tested query above:
First Last DOB Count
John Cook 2008-07-11 2
Kate Witt 2013-06-05 1
Deb Ruis 2016-01-22 1
Mike Bennet 2007-01-15 1
Kristy Cruz 1997-10-20 1
Colin Jones 2011-10-13 1
Kevin Smith 2010-02-24 1
Corey Bruce 2008-04-11 1
Shawn Maiers 2016-08-28 1
Alenn Fitchner 1998-05-17 1
If anyone have idea how I can prevent/skip updating duplicate records or how to improve this query please let me know. Thank you.
You could check for and avoid duplicate matches using with common_table_expression (Transact-SQL)
along with row_number()., like so:
with cte as (
select
t.fname
, t.lname
, t.dob
, t.usermasterid
, NewUserMasterId = t2.usermasterid
, rn = row_number() over (partition by t.fname, t.lname, t.dob order by t2.usermasterid)
from table1 as t
inner join table2 as t2 on t.dob = t2.dob
and t.fname = t2.fname
and t.lname = t2.lname
and ltrim(rtrim(t.usermasterid)) = ''
)
--/* confirm these are the rows you want updated
select *
from cte as t
where t.NewUserMasterId != ''
and not exists (
select 1
from cte as i
where t.dob = i.dob
and t.fname = i.fname
and t.lname = i.lname
and i.rn>1
);
--*/
/* update those where only 1 usermasterid matches this record
update t
set t.usermasterid = t.NewUserMasterId
from cte as t
where t.NewUserMasterId != ''
and not exists (
select 1
from cte as i
where t.dob = i.dob
and t.fname = i.fname
and t.lname = i.lname
and i.rn>1
);
--*/
I use the cte to extract out the sub query for readability. Per the documentation, a common table expression (cte):
Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement.
Using row_number() to assign a number for each row, starting at 1 for each partition of t.fname, t.lname, t.dob. Having those numbered allows us to check for the existence of duplicates with the not exists() clause with ... and i.rn>1
You could use a CTE to filter out the duplicates from Table1 before joining:
; with CTE as (select *
, count(ID) over (partition by LastName, FirstName, DoB) as IDs
from Table1)
update a
set a.ID = b.ID
from Table2 a
left join CTE b
on a.FirstName = b.FirstName
and a.LastName = b.LastName
and a.Dob = b.Dob
and b.IDs = 1
This will work provided there are no exact duplicates (same demographics and same ID) in table 1. If there are exact duplicates, they will also be excluded from the join, but you can filter them out before the CTE to avoid this.
Please try below SQL:
UPDATE Table1 AS tbl1
INNER JOIN Table2 AS tbl2
ON tbl1.dob = tbl2.dob
AND tbl1.fname = tbl2.fname
AND tbl1.lname = tbl2.lname
LEFT JOIN Table2 AS tbl3
ON tbl3.dob = tbl2.dob
AND tbl3.fname = tbl2.fname
AND tbl3.lname = tbl2.lname
AND tbl3.usermasterid <> tbl2.usermasterid
SET tbl1.usermasterid = tbl2.usermasterid
WHERE LTRIM(RTRIM(tbl1.usermasterid)) = ''
AND tbl3.usermasterid is null

Conditional JOIN Statement SQL Server

Is it possible to do the following:
IF [a] = 1234 THEN JOIN ON TableA
ELSE JOIN ON TableB
If so, what is the correct syntax?
I think what you are asking for will work by joining the Initial table to both Option_A and Option_B using LEFT JOIN, which will produce something like this:
Initial LEFT JOIN Option_A LEFT JOIN NULL
OR
Initial LEFT JOIN NULL LEFT JOIN Option_B
Example code:
SELECT i.*, COALESCE(a.id, b.id) as Option_Id, COALESCE(a.name, b.name) as Option_Name
FROM Initial_Table i
LEFT JOIN Option_A_Table a ON a.initial_id = i.id AND i.special_value = 1234
LEFT JOIN Option_B_Table b ON b.initial_id = i.id AND i.special_value <> 1234
Once you have done this, you 'ignore' the set of NULLS. The additional trick here is in the SELECT line, where you need to decide what to do with the NULL fields. If the Option_A and Option_B tables are similar, then you can use the COALESCE function to return the first NON NULL value (as per the example).
The other option is that you will simply have to list the Option_A fields and the Option_B fields, and let whatever is using the ResultSet to handle determining which fields to use.
This is just to add the point that query can be constructed dynamically based on conditions.
An example is given below.
DECLARE #a INT = 1235
DECLARE #sql VARCHAR(MAX) = 'SELECT * FROM [sourceTable] S JOIN ' + IIF(#a = 1234,'[TableA] A ON A.col = S.col','[TableB] B ON B.col = S.col')
EXEC(#sql)
--Query will be
/*
SELECT * FROM [sourceTable] S JOIN [TableB] B ON B.col = S.col
*/
You can solve this with union
select a, b
from tablea
join tableb on tablea.a = tableb.a
where b = 1234
union
select a, b
from tablea
join tablec on tablec.a = tableb.a
where b <> 1234
I disagree with the solution suggesting 2 left joins. I think a table-valued function is more appropriate so you don't have all the coalescing and additional joins for each condition you would have.
CREATE FUNCTION f_GetData (
#Logic VARCHAR(50)
) RETURNS #Results TABLE (
Content VARCHAR(100)
) AS
BEGIN
IF #Logic = '1234'
INSERT #Results
SELECT Content
FROM Table_1
ELSE
INSERT #Results
SELECT Content
FROM Table_2
RETURN
END
GO
SELECT *
FROM InputTable
CROSS APPLY f_GetData(InputTable.Logic) T
I think it will be better to think about your query in a different way and treat them more like sets.
I do believe if you make two separate queries then join them using UNION, It will be much better in performance and more readable.

How to combine fields from 2 columns to create a "matrix"?

I have a logging table in my application that only logs changed data, and leaves the other columns NULL. What I'm wanting to do now is create a view that takes 2 of those columns (Type and Status),
and create a resultset that returns the Type and Status on the entry of that log row, assuming that either one or both columns could be null.
For example, with this data:
Type Status AddDt
A 1 7/8/2013
NULL 2 7/7/2013
NULL 3 7/6/2013
NULL NULL 7/5/2013
B NULL 7/4/2013
C NULL 7/3/2013
C 4 7/2/2013
produce the resultset:
Type Status AddDt
A 1 7/8/2013
A 2 7/7/2013
A 3 7/6/2013
A 3 7/5/2013
B 3 7/4/2013
C 3 7/3/2013
C 4 7/2/2013
From there I'm going to figure out the first time in these results the Type and Status meet certain conditions, such as a Type of B and Status 3 (7/4/2013) and ultimately use that date in a calculation, so performance is a huge issue with this.
Here's what I was thinking so far, but it doesn't get me where I need to be:
SELECT
Type.TypeDesc
, Status.StatusDesc
, *
FROM
jw50_Item c
OUTER APPLY (SELECT TOP 10000 * FROM jw50_ItemLog csh WHERE csh.ItemID = c.ItemID AND csh.StatusCode = 'OPN' ORDER BY csh.AddDt DESC) [Status]
OUTER APPLY (SELECT TOP 10000 * FROM jw50_ItemLog cth WHERE cth.ItemID = c.ItemID AND cth.ItemTypeCode IN ('F','FG','NG','PF','SXA','AB') ORDER BY cth.AddDt DESC) Type
WHERE
c.ItemID = #ItemID
So with the help provided below, I was able to get where I needed. Here is my final solution:
SELECT
OrderID
, CustomerNum
, OrderTitle
, ItemTypeDesc
, ItemTypeCode
, StatusCode
, OrdertatusDesc
FROM
jw50_Order c1
OUTER APPLY (SELECT TOP 1 [DateTime] FROM
(SELECT c.ItemTypeCode, c.OrderStatusCode, c.OrderStatusDt as [DateTime] FROM jw50_Order c WHERE c.OrderID = c1.OrderID
UNION
select (select top 1 c2.ItemTypeCode
from jw50_OrderLog c2
where c2.UpdatedDt >= c.UpdatedDt and c2.ItemTypeCode is not null and c2.OrderID = c.OrderID
order by UpdatedDt DESC
) as type,
(select top 1 c2.StatusCode
from jw50_OrderLog c2
where c2.UpdatedDt >= c.UpdatedDt and c2.StatusCode is not null and c2.OrderID = c.OrderID
order by UpdatedDt DESC
) as status,
UpdatedDt as [DateTime]
from jw50_OrderLog c
where c.OrderID = c1.OrderID AND (c.StatusCode IS NOT NULL OR c.ItemTypeCode IS NOT NULL)
) t
WHERE t.ItemTypeCode IN ('F','FG','NG','PF','SXA','AB') AND t.StatusCode IN ('OPEN')
order by [DateTime]) quart
WHERE quart.DateTime <= #FiscalPeriod2 AND c1.StatusCode = 'OPEN'
Order By c1.OrderID
The union is to bring in the current data in addition to the log table data to create the resultset, since the current data maybe what meets the conditions required. Thanks again for the help guys.
Here is an approach that uses correlated subqueries:
select (select top 1 c2.type
from jw50_Item c2
where c2.AddDt >= c.AddDt and c2.type is not null
order by AddDt
) as type,
(select top 1 c2.status
from jw50_Item c2
where c2.AddDt >= c.AddDt and c2.status is not null
order by AddDt
) as status,
(select AddDt
from jw50_Item c
If you have indexes on jw50_item(AddDt, type) and jw50_item(AddDt, status), then the performance should be pretty good.
I suppose you want to "generate a history": for those dates that has some data missing, the next available data should be set.
Something similar should work:
Select i.AddDt, t.Type, s.Status
from Items i
join Items t on (t.addDt =
(select min(t1.addDt)
from Items t1
where t1.addDt >= i.addDt
and t1.Type is not null))
join Items s on (s.addDt =
(select min(s1.addDt)
from Items s1
where s1.addDt >= i.addDt
and s1.status is not null))
Actually I'm joining the base table to 2 secondary tables and the join condition is that we match the smallest row where the respective column in the secondary table is not null (and of course smaller than the current date).
I'm not absolutely sure that it will work, since I don't have an SQL Server in front of me but give it a try :)

Case not working in Exists in Sql Server

I have a scenario where i have to check a variable for it's default value, and if it has i have to check EXISTS part conditionally with Table2 and if it does not have the default value, i have to check EXISTS part conditionally with Table3.
Below is a sample code:-
SELECT * FROM tbl1 WHERE EXISTS (SELECT CASE WHEN #boolVar = 0 THEN (SELECT 'X' FROM tbl2 WHERE tbl1.col1 = tbl2.col1) ELSE (SELECT 'X' FROM tbl3 where tbl1.col1 = tbl3.col1) END)
Demo query with constants for testing purpose: -
SELECT 1 WHERE EXISTS (SELECT CASE WHEN 1 = 0 THEN (SELECT 'X' WHERE 1=0)
ELSE (SELECT 'X' WHERE 1 = 2) END)
Note: - The above query always returning 1, even not a single condition is satisfying.
I know we can use OR operator for the same and any how we can achieve it, but i really want to know that in case both the tables have no rows satisfying their particular where clause, even it's returning all the rows from Table1.
I tried to explain the same with the demo query with constant values.
Please help.
When your query doesn't find any matching records, it will basically do:
SELECT 1 WHERE EXISTS (SELECT NULL)
As a row containing a null value is still a row, the EXISTS command returns true.
You can add a condition to filter out the null row:
SELECT * FROM tbl1 WHERE EXISTS (
SELECT 1 FROM (
SELECT
CASE WHEN #boolVar = 0 THEN (SELECT 'X' FROM tbl2 WHERE tbl1.col1 = tbl2.col1)
ELSE (SELECT 'X' FROM tbl3 where tbl1.col1 = tbl3.col1)
END AS Y
) Z
WHERE Y IS NOT NULL
)
Here's an alternative, just in case:
SELECT *
FROM Table1
WHERE EXISTS (
SELECT 1
FROM Table2
WHERE #var = #defValue
AND ... /* other conditions as necessary */
UNION ALL
SELECT 1
FROM Table3
WHERE #var <> #defValue
AND ... /* other conditions as necessary */
);

Resources