SQL Select Only the Most Common Results

SQL Select Only the Most Common Results - sql-server

I have a table with IDs and Items where sometimes the associated Item has a variation from the other Items associated with the same ID. I need a query that selects the most common Item and assigns it to that ID.
The below query works, but I'm hoping to optimize it to avoid having to join two separate CTEs at the end, and rather have one slick SELECT statement:
IF OBJECT_ID('tempdb..#Test') IS NOT NULL
DROP TABLE #Test
CREATE TABLE #Test
(
[ID] INT
,[Item] VARCHAR(20)
)
INSERT #Test
VALUES
(100, 'Apple'),
(100, 'Apple'),
(100, 'Apples'),
(200, 'Orange'),
(200, 'Orange'),
(200, 'Orange'),
(200, 'Oranges'),
(300, 'Grape');
WITH cteOne AS (SELECT
[ID]
,[Item]
,COUNT(*) [Count]
FROM #Test
GROUP BY [ID]
,[Item]
),
cteTwo AS (SELECT
[ID]
,MAX([Count]) [Max]
FROM cteOne
GROUP BY [ID])
SELECT
C1.[ID]
,C1.[Item]
FROM cteOne C1
INNER JOIN cteTwo C2 ON C2.[ID] = C1.[ID]
AND C2.[Max] = C1.[Count]
ORDER BY [ID]
Any help is appreciated!

You can try top 1 with ties with row_number
select
top 1 with ties [ID], [Item]
from (
SELECT
[ID], [Item], COUNT(*) [Count]
FROM #Test
GROUP BY [ID], [Item]
) t
order by row_number() over (partition by [ID] order by [Count] desc)

This is even better:
;WITH
cteOne AS (
SELECT [ID],[Item] ,COUNT(*) [Count]
FROM #Test
GROUP BY [ID],[Item]
),
cteTwoo as (
select *, ROW_NUMBER() over (partition by id order by count) idx
from cteOne
)
select ID, Item
from cteTwoo
where idx = 1

Related

How to select only first ROW_NUMBER combined with SUM

I like to group my table by [ID] while using SUM and also bring back
[Product_Name] of the top ROW_NUMBER - not sure if I should use ROW_NUMBER, GROUPING SETS or loop through everything with FETCH... this is what I tried:
DECLARE #SampleTable TABLE
(
[ID] INT,
[Price] MONEY,
[Product_Name] VARCHAR(50)
)
INSERT INTO #SampleTable
VALUES (1, 100, 'Product_1'), (1, 200, 'Product_2'),
(1, 300, 'Product_3'), (2, 500, 'Product_4'),
(2, 200, 'Product_5'), (2, 300, 'Product_6');
SELECT
[ID],
[Product_Name],
[Price],
SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total],
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM
#SampleTable T1
My desired results - only two records:
1 Product_1 100.00 600.00 1
2 Product_4 500.00 1000.00 1
Any help or guidance is highly appreciated.
UPDATE:
I end up using what Prateek Sharma suggested in his comment, to simply wrap the query with another SELECT WHERE [Row_Number] = 1
SELECT * FROM
(
SELECT
[ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [ID]) AS [Row_Number]
FROM #SampleTable
) MultipleRows
WHERE [Row_Number] = 1

You should have a column on which you will perform ORDER BY for ROW_NUMBER(). In this case if you want to only rely on the table self index then it's OK to use ID column for ORDER BY.
Hence your query is correct and you can go with it.
Other option is to use WITH TIES clause. BUT again, If you will use WITH TIES clause with the ORDER BY on ID column then performance will be very poor. WITH TIES only performs well if you have well defined index. And, then can use that indexed column with WITH TIES clause.
SELECT TOP 1 WITH TIES *
FROM (
SELECT [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable
) TAB
ORDER BY ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY <IndexedColumn> DESC)
This query may help you bit. But remember, it is also not going to provide better performance than the query written by you. It is only reducing the line of code.

One option is using the WITH TIES clause. No extra field RN.
Hopefully, you have a proper sequence number or date which can be used in either the sum() over or in the final row_number() over
Example
SELECT Top 1 with ties *
From (
Select [ID]
,[Product_Name]
,[Price]
,SUM([Price]) OVER (PARTITION BY [ID]) AS [Price_Total]
FROM #SampleTable T1
) A
Order By ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [Price_Total] Desc)
Returns
ID Product_Name Price Price_Total
1 Product_1 100.00 600.00
2 Product_4 500.00 1000.00

There is no "top ROW_NUMBER" unless you have a column that defines ordering.
If you just want an arbitary row per id you can use the below. To deterministically pick one you would need to order by deterministic unique criteria.
DECLARE #SampleTable TABLE
(
ID INT,
Price MONEY,
Product_Name VARCHAR(50),
INDEX cix CLUSTERED (ID)
);
INSERT INTO #SampleTable
VALUES (1,100,'Product_1'),
(1,200,'Product_2'),
(1,300,'Product_3'),
(2,500,'Product_4'),
(2,200,'Product_5'),
(2,300,'Product_6');
WITH T AS
(
SELECT *,
OrderingColumn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM #SampleTable
)
SELECT ID,
SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Product_Name)), 11, 50) AS Product_Name,
CAST(SUBSTRING(MIN(CONCAT(STR(OrderingColumn), Price)), 11, 50) AS MONEY) AS Price,
SUM(Price) AS Price_Total
FROM T
GROUP BY ID
The plan for this is pretty efficient as it is able to use the index ordered by id and has no additional sorts, spools, or passes through the table.

sql query that gets the difference between 2 recent rows for every row item that occurs more than once in a table

Sql query that gets the difference between 2 recent rows for every value that occurs more than once in a table.
for example
book value date
A 4 2017-07-17 09:16:44.480
A 2 2017-08-15 10:05:58.273
B 3 2017-04-15 10:05:58.273
C 2 2017-08-15 10:05:58.273
B 3 2017-04-13 10:05:58.273
B 3 2017-04-12 10:05:58.273
should return
A 2
B 0

Here is a solution:
SELECT book, MAX(value) - MIN(value) AS difference FROM (
SELECT book, value, ROW_NUMBER() OVER (PARTITION BY book ORDER BY date DESC) AS rownum FROM t
) AS a WHERE rownum <= 2 GROUP BY book HAVING MAX(rownum) >= 2
And here it is in SQLFiddle

SELECT id_pk FROM [table] GROUP BY [fields you whant to compare by] HAVING COUNT(*) > 1)
this select returns you the list of pk from element that are repited
so, in other select you migth get another Select like
Select * from [table] where id_pk in(
SELECT id_pk FROM [table] GROUP BY [fields you whant to compare by] HAVING COUNT(*) > 1)) limit 2
this is functional, still not good as i'm not analising complexity.

Add a rownumber before calculating:
create table #test ([book] char(1), [value] int, [date] datetime)
insert into #test values ('A', 4, '2017-07-17 09:16:44.480')
insert into #test values ('A', 2, '2017-08-15 10:05:58.273')
insert into #test values ('B', 3, '2017-04-15 10:05:58.273')
insert into #test values ('C', 2, '2017-08-15 10:05:58.273')
insert into #test values ('B', 3, '2017-04-13 10:05:58.273')
insert into #test values ('B', 3, '2017-04-12 10:05:58.273')
;with cte as(
Select ROW_NUMBER () OVER (order by [book], [date] ) as rownumber, *
from #test)
select distinct [1].book, abs(first_value([1].[Value]) over (partition by [1].book order by [1].rownumber desc) - [2].val2) as [Difference]
from cte [1]
inner join
(select rownumber, book, first_value([Value]) over (partition by book order by rownumber desc) as val2
from cte) [2] on [1].book = [2].book and [1].rownumber < [2].rownumber

I would use analytic functions:
;with CTE as (
SELECT book
,value
,LAG(value) OVER (PARTITION BY book ORDER BY date) last_value
,ROW_NUMBER() OVER (PARTITION BY book ORDER BY date DESC) rn
FROM MyTable
)
SELECT book
,value - last_value as value_change
FROM CTE
WHERE rn = 1
AND last_value IS NOT NULL
LAG() was added in SQL Server 2012, but even if you're on a higher version, your database must have the compatibility version set to 110 or higher for them to be available. Here's an alternative that should work on SQL Server 2005 or higher, or a database compatibility 90 or higher.
;with CTE as (
SELECT book
,value
,ROW_NUMBER() OVER (PARTITION BY book ORDER BY date DESC) rn
FROM MyTable
)
SELECT c1.book
c1.value - c2.value as value_change
FROM CTE c1
INNER JOIN CTE c2
ON c1.book = c2.book
WHERE c1.rn = 1
AND c2.rn = 2

Repeat the first date withing a group

I Would like the first date of each group to repeat for the rest of the rows withing each group

You could use window expressions and grouping;
FIRST_VALUE (Transact-SQL)
You would need to partition by your first column. to get the split of A and B.
For example;
with cteTempData
(
[Code]
, [Date]
)
as
(
select 'A',cast('2015-9-4' as date)
union all select 'A','2015-9-4'
union all select 'A','2015-9-4'
union all select 'A','2015-9-16'
union all select 'B','2015-9-16'
union all select 'B','2015-9-22'
union all select 'B','2015-9-22'
union all select 'B','2015-10-26'
union all select 'B','2015-10-30'
)
select
[Code]
, [Date]
, FIRST_VALUE([Date]) over (partition by [Code] order by [Date]) as [First_Date]
from cteTempData
Using the first_value syntax also allows you to work with other columns in that ordered record....
with cteTempData
(
[Code]
, [Date]
, [Comment]
)
as
(
select 'A',cast('2015-9-4' as date),'One'
union all select 'A','2015-9-4','Two'
union all select 'A','2015-9-4','Three'
union all select 'A','2015-9-16','Four'
union all select 'B','2015-9-16','Five'
union all select 'B','2015-9-22','Six'
union all select 'B','2015-9-22','Seven'
union all select 'B','2015-10-26','Eight'
union all select 'B','2015-10-30','Nine'
)
select
[Code]
, [Date]
, FIRST_VALUE([Date]) over (partition by [Code] order by [Date]) as [First_Date]
, FIRST_VALUE([Comment]) over (partition by [Code] order by [Date]) as [First_Comment]
from cteTempData

Use MIN() Over ()
Declare #Table table (Grp varchar(25),Date date)
Insert into #Table values
('A','2015-09-04'),
('A','2015-09-05'),
('A','2015-09-10'),
('B','2015-10-04'),
('B','2015-10-05'),
('B','2015-10-10')
Select *
,GrpDate = min(Date) over (Partition By Grp)
From #Table
Returns
Grp Date GrpDate
A 2015-09-04 2015-09-04
A 2015-09-05 2015-09-04
A 2015-09-10 2015-09-04
B 2015-10-04 2015-10-04
B 2015-10-05 2015-10-04
B 2015-10-10 2015-10-04

You could use MIN with the OVER-clause
SELECT t.ColumnA,
DateCol = MIN( t.DateCol ) OVER ( PARTITION BY t.ColumnA ),
OtherColumns
FROM dbo.TableName t

you can go with a CROSS JOIN or FIRST_VALUE.
Declare #Yourtable table (groupCol varchar(25),firstDate date)
Insert into #Yourtable values
('A','2015-09-04'),
('A','2015-09-05'),
('A','2015-09-10'),
('B','2015-10-04'),
('B','2015-10-05'),
('B','2015-10-10')
SELECT a.*,b.firstDate
FROM #Yourtable a
CROSS JOIN (SELECT groupCol,MIN(firstDate) firstDate
FROM #Yourtable b
GROUP BY groupCol)b
WHERE a.groupCol =b.groupCol
OR
SELECT a.*,FIRST_VALUE(a.firstDate) OVER (PARTITION BY groupCol ORDER BY groupCol ASC) AS firstDate
FROM #Yourtable a

Getting the last row from a ROW_NUMBER using SQL

I am thinking there is a better way to grab the last row from a row_number instead of doing multiple nesting using T-SQL.
I need the total number of orders and the last ordered date. Say I have the following:
DECLARE #T TABLE (PERSON_ID INT, ORDER_DATE DATE)
INSERT INTO #T VALUES(1, '2016/01/01')
INSERT INTO #T VALUES(1, '2016/01/02')
INSERT INTO #T VALUES(1, '2016/01/03')
INSERT INTO #T VALUES(2, '2016/01/01')
INSERT INTO #T VALUES(2, '2016/01/02')
INSERT INTO #T VALUES(3, '2016/01/01')
INSERT INTO #T VALUES(3, '2016/01/02')
INSERT INTO #T VALUES(3, '2016/01/03')
INSERT INTO #T VALUES(3, '2016/01/04')
What I want is:
PERSON_ID ORDER_DATE ORDER_CNT
1 2016-01-03 3
2 2016-01-02 2
3 2016-01-04 4
Is there a better way to do this besides the following:
SELECT *
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY PERSON_ID ORDER BY ORDER_CNT DESC) AS LAST_ROW
FROM (
SELECT *
, ROW_NUMBER () OVER (PARTITION BY PERSON_ID ORDER BY ORDER_DATE) AS ORDER_CNT
FROM #T
) AS A
) AS B
WHERE LAST_ROW = 1

Yes, you can use this:
SELECT
PERSON_ID,
MAX(ORDER_DATE) AS ORDER_DATE,
COUNT(*) AS ORDER_CNT
FROM #T
GROUP BY PERSON_ID

SELECT a.PERSON_ID
, a.ORDER_DATE
, a.ORDER_CNT
FROM
(
SELECT PERSON_ID
, ORDER_DATE
, rn = ROW_NUMBER () OVER (PARTITION BY PERSON_ID ORDER BY ORDER_DATE DESC)
, ORDER_CNT = COUNT(ORDER_DATE) OVER (PARTITION BY PERSON_ID)
FROM #T
) AS a
WHERE rn = 1
ORDER BY a.PERSON_ID;

Pagination for a SQL Query Template

For my application I created a SQL query builder which has got Where and Order By clauses. I would like to know how to paginate through the results i.e. I would like to get a template on how to paginate through the results of a SQL query. This description may be a bit confusing, so it may be easier with an example:
Consider the Test Table
CREATE TABLE [dbo].[TestTable](
[RecordID] [int] NOT NULL,
[ID] [nvarchar](1000) NULL,
[Name] [nvarchar](1000) NULL,
[Dept] [nvarchar](1000) NULL
)
INSERT [dbo].[TestTable]
SELECT 1, N'1', N'Andy', N'IT'
UNION ALL
SELECT 2, N'2', N'Bob', N'IT'
UNION ALL
SELECT 3, N'3', N'Camila', N'Sales'
UNION ALL
SELECT 4, N'4', N'Drew', N'IT'
UNION ALL
SELECT 5, N'5', N'Elsie', N'Sales'
UNION ALL
SELECT 6, N'6', N'Frank', N'IT'
UNION ALL
SELECT 7, N'7', N'Gaby', N'Sales'
UNION ALL
SELECT 8, N'8', N'Hank', N'IT'
UNION ALL
SELECT 9, N'9', N'Iris', N'Sales'
UNION ALL
SELECT 10, N'8', N'John', N'IT'
Let us say that I have a Where Clause as:
WHERE ([Dept] = 'IT')
And an Order By Clause as:
ORDER BY [Name] DESC
I am attempting to do the pagination by using something like:
SELECT [RECORDID], [ID], [Name], [Dept], RowNum
FROM (
SELECT [RECORDID], [ID], [Name], [Dept],
ROW_NUMBER() OVER (ORDER BY [RecordID]) AS RowNum
FROM [TestTable] WHERE ([Dept] = 'IT')
) AS [TestTable_DerivedTable]
WHERE [TestTable_DerivedTable].RowNum BETWEEN 3 AND 6 ORDER BY [Name] DESC
This does not work because I cannot get the ORDER BY [Name] DESC into [TestTable_DerivedTable].
If I just had the WHERE clause, it would return the names:
Andy, Bob, Drew, Frank, Hank, and John.
If I put in the pagination i.e. BETWEEN 3 AND 6, I correctly get:
Drew, Frank, Hank, and John
How do I add the ORDER BY [Name] DESC so that I get (first the reversal, then the pagination):
Frank, Drew, Bob, and Andy

If you move the ORDER BY [Name] DESC into the Window function, you will get what you want:
SELECT [RECORDID], [ID], [Name], [Dept], RowNum
FROM
(
SELECT [RECORDID], [ID], [Name], [Dept]
, ROW_NUMBER() OVER (ORDER BY [Name] DESC) AS RowNum
FROM [TestTable] WHERE ([Dept] = 'IT')
) AS [TestTable_DerivedTable]
WHERE [TestTable_DerivedTable].RowNum BETWEEN 3 AND 6

Categories

HOME

snowflake-cloud-data-p...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Select Only the Most Common Results - sql-server

You can try top 1 with ties with row_number select top 1 with ties [ID], [Item] from ( SELECT [ID], [Item], COUNT(*) [Count] FROM #Test GROUP BY [ID], [Item] ) t order by row_number() over (partition by [ID] order by [Count] desc)

This is even better: ;WITH cteOne AS ( SELECT [ID],[Item] ,COUNT() [Count] FROM #Test GROUP BY [ID],[Item] ), cteTwoo as ( select , ROW_NUMBER() over (partition by id order by count) idx from cteOne ) select ID, Item from cteTwoo where idx = 1

Related

How to select only first ROW_NUMBER combined with SUM

sql query that gets the difference between 2 recent rows for every row item that occurs more than once in a table

Repeat the first date withing a group

Getting the last row from a ROW_NUMBER using SQL

Pagination for a SQL Query Template

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Select Only the Most Common Results - sql-server

You can try top 1 with ties with row_number select top 1 with ties [ID], [Item] from ( SELECT [ID], [Item], COUNT(*) [Count] FROM #Test GROUP BY [ID], [Item] ) t order by row_number() over (partition by [ID] order by [Count] desc)

This is even better: ;WITH cteOne AS ( SELECT [ID],[Item] ,COUNT(*) [Count] FROM #Test GROUP BY [ID],[Item] ), cteTwoo as ( select *, ROW_NUMBER() over (partition by id order by count) idx from cteOne ) select ID, Item from cteTwoo where idx = 1

Related

How to select only first ROW_NUMBER combined with SUM

sql query that gets the difference between 2 recent rows for every row item that occurs more than once in a table

Repeat the first date withing a group

Getting the last row from a ROW_NUMBER using SQL

Pagination for a SQL Query Template

Categories

Resources

This is even better: ;WITH cteOne AS ( SELECT [ID],[Item] ,COUNT() [Count] FROM #Test GROUP BY [ID],[Item] ), cteTwoo as ( select , ROW_NUMBER() over (partition by id order by count) idx from cteOne ) select ID, Item from cteTwoo where idx = 1