Find dup records with different extensions in SQL Server

Find dup records with different extensions in SQL Server - sql-server

I have a subscriptions table. Sample records:
SUBS_ID | SUBS Name
1 | SC FORM 124
2 | SC FORM 124-R
I need to find both the records, as the subscription name is exactly the same but just with an extension-R.

Really bad throwaway code written straight here and untested, but...
with cte As (Select Name, Id
From Subs
Where Name Not Like '%-R'
)
Select cte.Id, cte.Name, M.Name
From Subs As M
Join cte
On cte.Name + '-R' = M.Name

You can use row_Number and partition by as below:
Select * from (
Select *, DupeRecords = Row_number() over(partition by replace([Subs Name],'-R','') order by Subs_Id)
from #yoursubs
) a Where a.DupeRecords > 1

Based on your latest criteria:
So, in the above example when I query the table I should get all 3
records ...the first one being the base record and the remaining 2
being the extensions – SQL User 17 mins ago
SELECT distinct
0 as Subs_ID
, CASE WHEN SUBS_Name like '%-%' THEN left(SUBS_Name,charindex('-',SUBS_Name)-1) ELSE SUBS_Name END AS SUB_NAME_MAIN
, '' as Extension
FROM
subs
UNION
SELECT
Subs_ID
, CASE WHEN SUBS_Name like '%-%' THEN left(SUBS_Name,charindex('-',SUBS_Name)-1) ELSE SUBS_Name END AS SUB_NAME_MAIN
, CASE WHEN SUBS_Name like '%-%' THEN RIGHT(SUBS_Name, LEN(SUBS_Name) - charindex('-',SUBS_Name)+1) ELSE '' END AS Extension
FROM
subs
will produce the following result. A 'Master' row that is given an arbitray ID number of '0' and each row of that master's family and its extension.
Subs_ID SUB_NAME_MAIN Extension
----------- -------------------- --------------------
0 SC FORM 124
1 SC FORM 124
2 SC FORM 124 -R

Related

Is there a way to Merge SQL Server Row Query?

Please look at my current summary query result :
id name mch
127664 ML 2
127666 ML 2
127667 ML 2
127670 ML 2
127671 ML 2
127672 ML 2
127674 ML 2
127675 ML 2
127678 ML 1
127680 ML 1
127665 ML 2
I want to merge row which has same value..
Just merge the name column.
And then here is my expected query :
id name mch
127664 ML 2
127666 2
127667 2
127670 2
127671 2
127672 2
127674 2
127675 2
127678 1
127680 1
127665 2
I already look for some problem but still not found.
I hope you want to guide me to handle this..

Well you could try a ROW_NUMBER trick here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) rn
FROM yourTable
)
SELECT
id,
CASE WHEN rn = 1 THEN name ELSE '' END AS name,
mch
FROM cte
ORDER BY
name,
id;
But typically this type of requirement would be better handled in your presentation layer (e.g. PHP or Java).

Your query should be like this :
WITH t AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) counter
FROM table //nameofyourtable
)
SELECT
id,CASE WHEN counter = 1 THEN name ELSE '' END AS name, mch
FROM t ORDER BY
name,
id;

Sql Server Weird CASE Statement

I am attempting to do something, but I am not sure if it is possible. I don't really know how to look up something like this, so I'm asking a question here.
Say this is my table:
Name | Group
-----+--------
John | Alpha
Dave | Alpha
Dave | Bravo
Alex | Bravo
I want to do something like this:
SELECT TOP 1 CASE
WHEN Group = 'Alpha' THEN 1
WHEN Group = 'Bravo' THEN 2
WHEN Group = 'Alpha' AND
Group = 'Bravo' THEN 3
ELSE 0
END AS Rank
FROM table
WHERE Name = 'Dave'
I understand why this won't work, but this was the best way that I could explain what I am trying to do. Basically, I just need to know when one person is a part of both groups. Does anyone have any ideas that I could use?

You should create a column to hold the values you want to sum and sum them, probably easiest to do this via a subquery:
Select Name, SUM(Val) as Rank
FROM (SELECT Name, CASE WHEN Group = 'Alpha' THEN 1
WHEN Group = 'Bravo' THEN 2
ELSE 0 END AS Val
FROM table
WHERE Name = 'Dave') T
GROUP BY Name
You can add TOP 1 and ORDER BY SUM(Val) to get the top ranked row if required.
After reading your comment, it could be simplified further to:
Select Name, COUNT([GROUP]) GroupCount
FROM table
GROUP BY Name
HAVING COUNT([GROUP]) > 1
That will simply return all names where they have more than 1 group.

SQL Server | Finding out count and category

Table A
Owner row_no category
-------------------------
A 1 U
B 1 T
B 2 T
C 1 U
C 2 T
C 3 U
C 4 U
I'm looking for a solution that stores values into other table which should retrieve
row_no as 1 if the value is 1 and should return max(row_no)-1 if the value isn't 1.
category should be either T or U or both based on whether an owner has opted for only T or U or both in TABLE A.
Expected table output should be something like below.
Owner row_no category
---------------------------
A 1 U
B 1 T
C 3 Both
I tried using the below approach which turns out to be an error.
SELECT *
INTO B
FROM A
WHERE ROW_NO LIKE CASE
WHEN ROW_NO = 1 THEN ROW_NO
ELSE MAX(ROW_NO) - 1
END
Haven't figured out yet on retrieving the category!
Could you please help with correct approach?

Your logic is not completely clear to me. In particular, I assume here that your logic for reporting the row_no is to return 1 when the max value for an owner is 1, otherwise to return that max value minus 1.
We can try doing a simple aggregation query here to generate what you want.
SELECT
Owner,
CASE WHEN MAX(row_no) = 1 THEN 1 ELSE MAX(row_no) - 1 END AS row_no,
CASE WHEN COUNT(DISTINCT category) > 1 THEN 'Both' ELSE MAX(category) END AS category
FROM tableA
GROUP BY
Owner;
Demo

One method would be to use a GROUP BY:
WITH VTE AS(
SELECT *
FROM(VALUES ('A',1,'U'),
('B',1,'T'),
('B',2,'T'),
('C',1,'U'),
('C',2,'T'),
('C',3,'U'),
('C',4,'U')) V([Owner], Row_no, Category))
SELECT [Owner],
ISNULL(NULLIF(MAX(Row_no) - 1,0),1) AS Row_no,
CASE WHEN MIN(Category) = MAX(Category) THEN MAX(Category) ELSE 'Both' END AS Category --Assumes Category cannot have a value of NULL
FROM VTE
GROUP BY [Owner];

How do I exclude rows when an incremental value starts over?

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.

Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle

Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

More efficient alternative to subquery

I've got many "actual" and "history companion tables" (so to speak) with structure of last like this:
values| date_deal | type_deal | num (autoinc)
value1| 01.01.2012 | i | 1
value1| 02.01.2012 | u | 2
value2| 02.01.2012 | i | 3
value2| 03.01.2012 | u | 4
value1| 04.01.2012 | d | 5
value2| 05.01.2012 | u | 6
value2| 08.01.2012 | u | 7
If I insert (or update or delete) record in "actual" table, trigger puts affected record into "history table" with date_deal = Geddate(), type_deal = i|u|d (for insert, update and delete triggers respectivly) and num as autoinc unique value
So the question is how to get last record for each distinct value valid on certain date and excluding from final result records which type_deal = 'd' (since that record was deleted from actual table by that time and we don't want to have anything assosiated with it)
The way I do it most of the time:
SELECT *
FROM t_table1 t1
WHERE t1.num = ( SELECT MAX(num)
FROM t_table1 t2
WHERE t2.[values] = t1.[values]
AND t2.[date_deal] < #dt)
AND t1.[type_deal] <> 'D'
But that works very slow sometimes. I'm looking for more efficient alternative. Please, help
So, an update.
Thanks for replies, friends.
I've made some testing on both actual and testing servers.
In order to put these different approaches into same league I've decided that we should take all fields from source table.
Testing server has bellow 200K records and I also had a luxury of using DBCC FreeProcCache
and DBCC DropCleanbuffers directives. Actual working server has over 2.3M records and also no option for droping buffs or cache since.. well.. it is in use by real users. So it was droped only once and i've got results right after that.
Here is actual queries and time it took on both servers:
Original:
DECLARE #dt datetime = CONVERT(datetime, '01.08.2013', 104)
SELECT *
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE c.num = ( SELECT MAX(num)
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
WHERE c2.[AccountSys] = c.[AccountSys]
AND date_deal <= #dt)
AND c.type_deal <> 'D'
61sec # 2'316'890rec on real one, 4sec # 191'533 on test
Rahul's:
SELECT *
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
GROUP BY [all_fields]
HAVING c.num = ( SELECT MAX(num)
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
WHERE c2.[AccountSys] = c.[AccountSys]
AND date_deal <= #dt)
AND c.type_deal <> 'D'
62sec # 2'316'890rec on real one, 4sec # 191'533 on test
Almost equal
George's (with some major changes):
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY accountsys ORDER BY num desc) AS aa
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE c.date_deal < #dt) as a
WHERE aa=1
AND type_deal <> 'D'
76sec # 2'316'890rec on real one, 5sec # 191'533 on test
So far original and Rahul's are fastest and George's is not so fast.

Try using GROUP BY..HAVING CLAUSE
SELECT *
FROM t_table1 t1
GROUP BY [column_names]
HAVING t1.num = ( SELECT MAX(num)
FROM t_table1 t2
WHERE t2.[values] = t1.[values]
AND t2.[date_deal] < #dt)
AND t1.[type_deal] <> 'D'

I think row_num() could be usefull for you as follows:
select
*
from
(
select
*,
row_number() over( partition by date_deal order by num) as aa
from
t_table1 t1
where
t1.[type_deal] <> 'D'
) as a
where
aa=1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Find dup records with different extensions in SQL Server - sql-server

I have a subscriptions table. Sample records: SUBS_ID | SUBS Name 1 | SC FORM 124 2 | SC FORM 124-R I need to find both the records, as the subscription name is exactly the same but just with an extension-R.

Really bad throwaway code written straight here and untested, but... with cte As (Select Name, Id From Subs Where Name Not Like '%-R' ) Select cte.Id, cte.Name, M.Name From Subs As M Join cte On cte.Name + '-R' = M.Name

You can use row_Number and partition by as below: Select * from ( Select *, DupeRecords = Row_number() over(partition by replace([Subs Name],'-R','') order by Subs_Id) from #yoursubs ) a Where a.DupeRecords > 1

Related

Is there a way to Merge SQL Server Row Query?

Sql Server Weird CASE Statement

SQL Server | Finding out count and category

How do I exclude rows when an incremental value starts over?

More efficient alternative to subquery

Categories

Resources