Finding dublets and variations in mssql

Finding dublets and variations in mssql - sql-server

I have a table with a whole bunch of data. In that table there is a row, with not unique ids, so there can be dublets of them - I have found them by doing this query:
SELECT theid FROM thetable
GROUP BY theid
HAVING COUNT(*) > 1
In the table there is also columns like street1,street2,city1,city2
In the list of rows from the first query where I found dublets, in those I need to check if street1 is different from street2 and city1 from city2, in any of the dublets of the given id from the first query - makes sense?
So lets say we have two rows with the same ids - in those i need to check if street1 is different from street1 in all rows with the specific id
Any tips, pointers on how to do this, I am getting blind starring at this problem, and cant seem to find the right query for that.
Thanks a bunch

Using a CTE will help:
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = c1.Street1,
newestStreet1 = c2.Street1,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
You could also add a case statement to display matches vs. non-matches. This will help to manually identify typos (1337 Test St. vs 1337 Test Street):
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEN 'Match' ELSE c1.Street1 END,
newestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEn 'Match' ELSE c2.Street1 END,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
Or you could return just the items that do not match by adding it to your INNER JOIN clause:
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = c1.Street1,
newestStreet1 = c2.Street1,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1 AND c1.Street1 <> c2.Street1 -- add as many of these as you need.
Keep in mind these are exact matches. You could implement static-fuzzy logic LEFT(Zip, 5) to match only on the first 5 digits of Zip Code (in case some have a zip + 4 and some don't.)

you can also analyse like this,
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
,
CTE1 as
(
select *,ROW_NUMBER()
OVER(PARTITION BY theID,Street1,Street2,City,State,Zip
oRDER BY theID) rn2 from cte where rn>2
)
select * from cte1

Related

Print results side by side SQL Server

I have following result set,
Now with above results i want to print the records via select query as below attached image
Please note, I will have only two types of columns in output Present Employee & Absent Employees.
I tried using pivot tables, temporary table but cant achieve what I want.

One method would be to ROW_NUMBER each the the "statuses" and then use a FULL OUTER JOIN to get the 2 datasets into the appropriate columns. I use a FULL OUTER JOIN as I assume you could have a different amount of employees who were present/absent.
CREATE TABLE dbo.YourTable (Name varchar(10), --Using a name that doesn't require delimit identification
Status varchar(7), --Using a name that doesn't require delimit identification
Days int);
GO
INSERT INTO dbo.YourTable(Name, Status, Days)
VALUES('Mal','Present',30),
('Jess','Present',20),
('Rick','Absent',30),
('Jerry','Absent',10);
GO
WITH RNs AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Days DESC) AS RN
FROM dbo.YourTable)
SELECT P.Name AS PresentName,
P.Days AS PresentDays,
A.Name AS AbsentName,
A.Days AS AbsentDays
FROM (SELECT R.Name,
R.Days,
R.Status,
R.RN
FROM RNs R
WHERE R.Status = 'Present') P
FULL OUTER JOIN (SELECT R.Name,
R.Days,
R.Status,
R.RN
FROM RNs R
WHERE R.Status = 'Absent') A ON P.RN = A.RN;
GO
DROP TABLE dbo.YourTable;
db<>fiddle
2 CTE's is actually far neater:
WITH Absents AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (ORDER BY Days DESC) AS RN
FROM dbo.YourTable
WHERE Status = 'Absent'),
Presents AS(
SELECT Name,
Status,
Days,
ROW_NUMBER() OVER (ORDER BY Days DESC) AS RN
FROM dbo.YourTable
WHERE Status = 'Present')
SELECT P.Name AS PresentName,
P.Days AS PresentDays,
A.Name AS AbsentName,
A.Days AS AbsentDays
FROM Absents A
FULL OUTER JOIN Presents P ON A.RN = P.RN;

Query table and Select latest 2 rows (in SQL Server)

I have a table that logs all updates made to an application. I want to query the table and return the last update by [Timestamp] and the update before that for a different value [ITEM]. I'm struggling to figure out how to get what i need. I'm returning more than one record for each ID and don't want that.
;WITH cte AS
(
SELECT
ID,
LAG(ITEM) OVER (PARTITION BY ID ORDER BY timestamp DESC) AS ITEM,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY timestamp DESC) RN
FROM
MyLoggingTable
WHERE
accountid = 1234
)
SELECT
cte.ID,
dl.ITEM,
DL.timestamp
FROM
cte
JOIN
MyLoggingTable DL ON cte.ID = DL.ID
WHERE
rn = 1
AND cte.ID IN ('id here | Sub select :( ..')

Is ID unique? Because if it is, your code shouldn't return duplicates. If it isn't, you will get duplicates because you are joining back to the MyLoggingTable which isn't needed. You should just move those columns (dl.Item & dl.timestamp) into the cte and return them from the cte like you did cte.ID.
I removed the LAG since you didn't return that column in your final query.
;WITH cte AS
(
SELECT
ID,
ITEM,
[timestamp],
--LAG(ITEM) OVER (PARTITION BY ID ORDER BY timestamp DESC) AS ITEM,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY timestamp DESC) RN
FROM
MyLoggingTable
WHERE
accountid = 1234
)
SELECT
cte.ID,
cte.ITEM,
cte.timestamp
FROM
cte
WHERE
rn = 1
AND cte.ID IN ('id here | Sub select :( ..')
Note, if you wanted the second to the last item, as you stated in your comments, make rn=2

SQL Simple Join with two tables, but one is random

I am stuck with this. I have a simple set-up with two tables. One table is holding emailaddresses one table is holding vouchercodes. I want to join them in a third table, so that each emailaddress has one random vouchercode.
Unfortunatly I am stuck with this as there are no identic Ids to match both values. What I have so far brings no result:
Select
A.Email
B.CouponCode
FROM Emailaddresses as A
JOIN CouponCodes as B
on A.Email = B.CouponCode
A hint would be great as search did not bring me any further yet.
Edit -
Table A (Addresses)
-------------------
Column A | Column B
-------------------------
email1#gmail.com True
email2#gmail.com
email3#gmail.com True
email4#gmail.com
Table B (Voucher)
-------------------
ABCD1234
ABCD5678
ABCD9876
ABCD5432
Table C
-------------------------
column A | column B
-------------------------
email1#gmail.com ABCD1234
email2#gmail.com ABCD5678
email3#gmail.com ABCD9876
email4#gmail.com ABCD5432
Sample Data:

While joining without proper keys is not a good solution, for your case you can try this. (note: not tested, just a quick suggestion)
;with cte_email as (
select row_number() over (order by Email) as rownum, Email
from Emailaddresses
)
;with cte_coupon as (
select row_number() over (order by CouponCode) as rownum, CouponCode
from CouponCodes
)
select a.Email,b.CouponCode
from cte_email a
join cte_coupon b
on a.rownum = b.rownum

You want to randomly join records, one email with one coupon each. So create random row numbers and join on these:
select
e.email,
c.couponcode
from (select t.*, row_number() over (order by newid()) as rn from emailaddresses t) e
join (select t.*, row_number() over (order by newid()) as rn from CouponCodes t) c
on c.rn = e.rn;

Give a row number for both the tables and join it with row number.
Query
;with cte as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_A]
),
cte2 as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_B]
)
select t1.[Column_A] as [Email_Id], t2.[Column_A] as [Coupon]
from cte t1
join cte2 t2
on t1.rn = t2.rn;
Find a demo here

Updating multiple row with random data from another table?

Combining some examples, I came up with the following query (fields and table names have been anonymised soI hope I didn't insert typos).
UPDATE destinationTable
SET destinationField = t2.value
FROM destinationTable t1
CROSS APPLY (
SELECT TOP 1 'SomeRequiredPrefix ' + sourceField as value
FROM #sourceTable
WHERE sourceField <> ''
ORDER BY NEWID()
) t2
Problem
Currently, all records get the same value into destinationField , value needs to be random and different. I'm probably missing something here.

Here's a possible solution. Using CTE's assign row numbers to both tables based on random order. Join the tables together using that rownumber and update the rows accordingly.
;WITH
dt AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.destinationtable),
st AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.#sourcetable)
UPDATE dt
SET dt.destinationfield = 'SomeRequiredPrefix ' + st.sourcefield
FROM dt
JOIN st ON dt.RowNum = st.RowNum
UPDATED SOLUTION
I used CROSS JOIN to get all possibilities since you have less rows in source table. Then assign random rownumbers and only take 1 row for each destination field.
;WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY destinationfield ORDER BY NEWID()) AS Rownum
FROM destinationtable
CROSS JOIN #sourcetable
WHERE sourcefield <> ''
)
UPDATE cte
SET cte.destinationfield = 'SomeRequiredPrefix ' + sourcefield
WHERE cte.Rownum = 1
SELECT * FROM dbo.destinationtable

Display distinct Badge_ID

This is my query having the Current Results as displayed below.
SELECT
Distinct CONVERT(int, Employees_1.Emp_Badge_No) AS Emp_Badge_No,
Employees_1.Emp_LastName, Employees_1.Emp_FirstName, Employees_1.Email,
Employees_1.NT_Name, Employees_1.Dept_key,
Employees_1.Emp_LastName + ',' + Employees_1.Emp_FirstName AS FullName,
dbo.department_vw.DepartmentShortName AS deptname,
Employees_1.active_flag
FROM data_common.dbo.employees_union_vw AS Employees_1
INNER JOIN dbo.department_vw
ON Employees_1.Dept_key = dbo.department_vw.DepartmentKey
Sample data:
I need help to achieve the Expected Results. What will I modify with my existing sql query?
I want to keep all the records even though it is inactive as long as the Emp_Badge_No is not repeated. I only want those duplicate Emp_Badge_No to be remove.
Thanks in advance.

You may want to use ROW_NUMBER for this. Modify the ORDER BY clause depending on which row from the duplicate entry you want to retrieve:
WITH Cte AS(
SELECT
e.Emp_Badge_No,
e.Emp_LastName,
e.Emp_FirstName,
e.Email,
e.NT_Name,
e.Dept_key,
e.Emp_LastName + ',' + e.Emp_FirstName AS FullName,
d.DepartmentShortName AS deptname,
e.active_flag,
rn = ROW_NUMBER() OVER(PARTITION BY e.Emp_Badge_No ORDER BY e.Active)
FROM data_common.dbo.employees_union_vw AS e
INNER JOIN dbo.department_vw d
ON e.Dept_key = d.DepartmentKey
)
SELECT
Emp_Badge_No,
Emp_LastName,
Emp_FirstName,
Email,
NT_Name,
Dept_key,
FullName,
deptname,
active_flag
FROM Cte
WHERE rn = 1
The above will get the Inactive record if there are duplicates. If you want to get the Active records instead, replace rn with:
ROW_NUMBER() OVER(PARTITION BY Emp_Badge_No ORDER BY e.Active DESC)
If you don't care whether it's Active or Inactive, replace rn with:
ROW_NUMBER() OVER(PARTITION BY Emp_Badge_No ORDER BY (SELECT NULL))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Finding dublets and variations in mssql - sql-server

Related

Print results side by side SQL Server

Query table and Select latest 2 rows (in SQL Server)

SQL Simple Join with two tables, but one is random

Updating multiple row with random data from another table?

Display distinct Badge_ID

Categories

Resources