ROW_NUMBER vs COUNT(1)? - sql-server

Which of the two alternatives is better?
ROW_NUMBER() OVER (PARTITION BY...)
or
COUNT(1) OVER (PARTITION BY ...)
I could not find any such question.
Edit:
DBMS: SQL-SERVER (version >= 2008)
In my case the over partition is guaranteed by a single field:
ROW_NUMBER() OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
COUNT(1) OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
ELEMENT EMPLOYEE ROW_NUMBER COUNT
0000001 00000003 1 1
0000001 00000004 2 2
0000001 00000005 3 3
0000003 00000045 1 1
0000003 00000046 2 2

COUNT(1) behaves different when the same group of values in the ORDER BY columns are repeated.
The following is an example of SQL Server:
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
DROP TABLE #Example
CREATE TABLE #Example (
Number INT,
GroupNumber INT)
INSERT INTO #Example (
Number,
GroupNumber)
VALUES
(NULL, 1),
(100, 1),
(101, 1),
(102, 1),
(103, 1),
(NULL, 2),
(NULL, 2),
(NULL, 2),
(200, 2),
(201, 2),
(202, 2),
(300, 3),
(301, 3),
(301, 3),
(301, 3),
(302, 3)
SELECT
E.*,
RowNumber = ROW_NUMBER() OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC),
CountOver = COUNT(1) OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC)
FROM
#Example AS E
Result:
Number GroupNumber RowNumber CountOver
----------- ----------- -------------------- -----------
NULL 1 1 1
100 1 2 2
101 1 3 3
102 1 4 4
103 1 5 5
NULL 2 1 3 Here
NULL 2 2 3
NULL 2 3 3
200 2 4 4
201 2 5 5
202 2 6 6
300 3 1 1
301 3 2 4 Here
301 3 3 4
301 3 4 4
302 3 5 5
This is because it's a count and not a row number. You should use the one that's appropriate to your needs.

Related

Snowflake cumulative sum for multiple entry in same date for a given partition

I have a table with below data set. I want to get the cumulative sum based on PK1 and PK2 as on TXN_DATE. I have tried with cumulative window frame functions and its giving the expected result. But I want the output to be in desired format which needs to be grouped by TXN_DATE.
SELECT
PK1
,PK2
,TXN_DATE
,QTY
,SUM(QTY) OVER (PARTITION BY PK1,PK2 ORDER BY TXN_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) SUM_QTY
FROM MY_TABLE
ORDER BY TXN_DATE;
Above query is giving the result as below,
I want the result as shown below in either one of the format, Can someone help to get the desired result based on this.
OR
Just get rid of rows between unbounded preceding and current row in your window function. The sum() window function will make the daily total the same for all rows on the same day that way.
with SOURCE_DATA as
(
select COLUMN1::string as PK1
,COLUMN2::string as PK2
,COLUMN3::date as TXN_DATE
,COLUMN4::int as QTY
from (values
('P001', 'XYZ', '2022-11-03', 15),
('P001', 'XYZ', '2022-11-08', -1),
('P001', 'XYZ', '2022-11-12', -4),
('P002', 'ABZ', '2022-11-03', 10),
('P002', 'ABZ', '2022-11-03', 1), -- This was listed as ABC in the photo
('P002', 'ABZ', '2022-11-05', -5),
('P002', 'ABZ', '2022-11-10', -1),
('P002', 'ABZ', '2022-11-10', -1),
('P002', 'ABZ', '2022-11-10', 1)
)
)
select *
,sum(QTY) over (partition by PK1, PK2 order by TXN_DATE) QUANTITY
from SOURCE_DATA
order by PK1, TXN_DATE
;
Output:
PK1
PK2
TXN_DATE
QTY
QUANTITY
P001
XYZ
2022-11-03
15
15
P001
XYZ
2022-11-08
-1
14
P001
XYZ
2022-11-12
-4
10
P002
ABZ
2022-11-03
10
11
P002
ABZ
2022-11-03
1
11
P002
ABZ
2022-11-05
-5
6
P002
ABZ
2022-11-10
-1
5
P002
ABZ
2022-11-10
-1
5
P002
ABZ
2022-11-10
1
5

Running total in SQL Server based on condition [duplicate]

This question already has an answer here:
SQL Server - Cumulative Sum that resets when 0 is encountered
(1 answer)
Closed 3 years ago.
I am creating a running total for specific group in a sequence. In between a sequence zero value occurs for which I have to start the running total from the zero record
select
Sno,
Group,
Value,
sum(Value) over(partition by Group order by Sno) Cum_Value
from
Table
Output:
Sno Group Value CumValue
-------------------------------
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 40
5 A 10 50
6 A 5 55
7 A 0 55
7 A 20 75
Sno Group Value CumValue
------------------------------
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 0--> zero occurs [starts running total again]
5 A 10 10
6 A 5 15
7 A 0 0--> zero occurs [starts running total again]
7 A 20 20
You may try with the following approach:
Input:
CREATE TABLE #Data (
Sno int,
[Group] varchar(1),
[Value] int
)
INSERT INTO #Data
(Sno, [Group], [Value])
VALUES
(1, 'A', 5),
(2, 'A', 10),
(3, 'A', 25),
(4, 'A', 0),
(5, 'A', 10),
(6, 'A', 5),
(7, 'A', 0),
(8, 'A', 20)
Statement:
SELECT
Sno,
[Group],
[Value],
Changed,
SUM([Value]) OVER (PARTITION BY Changed ORDER BY Sno) AS Cum_Value
FROM
(
SELECT
Sno,
[Group],
[Value],
SUM (CASE
WHEN [Value] = 0 THEN 1
ELSE 0
END) OVER (PARTITION BY [Group] ORDER BY Sno) AS Changed
FROM #Data
) t
Output:
Sno Group Value Cum_Value
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 0
5 A 10 10
6 A 5 15
7 A 0 0
8 A 20 20

Separate a group of values in to A or B (1 or 2) SQL

Note: I have already asked a similar question but had omitted a key part in the fact that a tool has many components.
I have a list of multiple tools and their components that all have a model number. I want to group every second tool based on the model it belongs to.
The derivedColumn is the query I want to return
declare #t table (Model int, toolID INT ,Component INT,DerivedColumn int);
insert into #t values (1,1,1,1),(1,1,2,1),(1,1,3,1),(1,2,1,2),(1,2,2,2),(1,2,3,2),(1,3,1,1),(1,3,2,1),(1,3,3,1),(1,4,1,2),(1,4,2,2),(1,4,3,2),(1,5,1,1),(1,5,2,1),(1,5,3,1),(2,1,1,1),(2,1,2,1),(2,2,1,2),(2,2,2,2),(2,3,1,1),(2,3,2,1)
SELECT * FROM #t
Model toolID Component DerivedColumn
1 1 1 1
1 1 2 1
1 1 3 1
1 2 1 2
1 2 2 2
1 2 3 2
1 3 1 1
1 3 2 1
1 3 3 1
1 4 1 2
1 4 2 2
1 4 3 2
1 5 1 1
1 5 2 1
1 5 3 1
2 1 1 1
2 1 2 1
2 2 1 2
2 2 2 2
2 3 1 1
2 3 2 1
Every second tool belonging to a model should have an alternative group number.
I believe I have to use a windows function but haven't been able to solve.
You could use dense_rank() and mod function %2 to calculate
DECLARE #SampleData AS TABLE
(
Model int,
ToolId int,
Component int
)
INSERT INTO #SampleData
(
Model, ToolId, Component
)
VALUES
(1, 1, 1),(1, 1, 2),(1, 1, 3),(1, 2, 1),
(1, 2, 2),(1, 2, 3),(1, 3, 1),(1, 3, 2),
(1, 3, 3),(1, 4, 1),(1, 4, 2),(1, 4, 3),
(1, 5, 1),(1, 5, 2),(1, 5, 3),(2, 1, 1),
(2, 1, 2),(2, 2, 1),(2, 2, 2),(2, 3, 1),
(2, 3, 2)
SELECT *,
CASE (dense_rank() OVER(PARTITION BY sd.Model ORDER BY sd.ToolId) + 1) % 2
WHEN 1 THEN 2
WHEN 0 THEN 1
END as DerivedColumn
FROM #SampleData sd
ORDER BY sd.Model, sd.ToolId
Demo link: http://rextester.com/LIQL79881
Hope it may helps you
DECLARE #T TABLE (Model INT, toolID INT ,Component INT,DerivedColumn INT);
INSERT INTO #T VALUES (1,1,1,1),(1,1,2,1),(1,1,3,1),(1,2,1,2),(1,2,2,2),(1,2,3,2),(1,3,1,1),(1,3,2,1),(1,3,3,1),(1,4,1,2),(1,4,2,2),(1,4,3,2),(1,5,1,1),(1,5,2,1),(1,5,3,1),(2,1,1,1),(2,1,2,1),(2,2,1,2),(2,2,2,2),(2,3,1,1),(2,3,2,1)
SELECT Model
,toolID
,ROW_NUMBER()Over(Partition by toolID order by Model) AS AlternativetoolID
,Component
,DerivedColumn
from #t;

Running total/ID groups based on specific value in TSQL

I have data that looks like ID and Col1, where the value 01 in Col1 denotes the start of a related group of rows lasting until the next 01.
Sample Data:
ID Col1
1 01
2 02
3 02
---------
4 01
5 02
6 03
7 03
----------
8 01
9 03
----------
10 01
I need to calculate GroupTotal, which provides a running total of '01' from Col1, and also GroupID, which is an increment ID that resets at every instance of '01' in Col 1. Row order must be preserved with ID.
Desired Results:
ID Col1 GroupTotal GroupID
1 01 1 1
2 02 1 2
3 02 1 3
----------------------------
4 01 2 1
5 02 2 2
6 03 2 3
7 03 2 4
----------------------------
8 01 3 1
9 03 3 2
----------------------------
10 01 4 1
I've been messing with OVER, PARTITION BY etc. and cannot crack either.
Thanks
I believe what the OP is saying is that the only data available is a table with the id and col1 data, and that the desired results is what is currently posted in the question.
If that is the case, you just need the following.
Sample Data Setup:
declare #grp_tbl table (id int, col1 int)
insert into #grp_tbl (id, col1)
values (1, 1),(2, 2),(3, 2),(4, 1),(5, 2),(6, 3),(7, 3),(8, 1),(9, 3),(10, 1)
Answer:
declare #max_id int = (select max(id) from #grp_tbl)
; with grp_cnt as
(
--getting the range of ids that are in each group
--and ranking them
select gt.id
, lead(gt.id - 1, 1, #max_id) over (order by gt.id asc) as id_max --max id in the group
, row_number() over (order by gt.id asc) as grp_ttl
from #grp_tbl as gt
where 1=1
and gt.col1 = 1
)
--ranking the range of ids inside each group
select gt.id
, gt.col1
, gc.grp_ttl as group_total
, row_number() over (partition by gc.grp_ttl order by gt.id asc) as group_id
from #grp_tbl as gt
left join grp_cnt as gc on gt.id between gc.id and gc.id_max
Final Results:
id col1 group_total group_id
1 1 1 1
2 2 1 2
3 2 1 3
4 1 2 1
5 2 2 2
6 3 2 3
7 3 2 4
8 1 3 1
9 3 3 2
10 1 4 1
If I understood correctly, this is what you want:
CREATE TABLE #tmp
([ID] int, [Col1] int, [GroupTotal] int, [GroupID] int)
;
INSERT INTO #tmp
([ID], [Col1], [GroupTotal], [GroupID])
VALUES
(1, 01, 1, 1),
(2, 02, 1, 2),
(3, 02, 1, 3),
(4, 01, 2, 1),
(5, 02, 2, 2),
(6, 03, 2, 3),
(7, 03, 2, 4),
(8, 01, 3, 1),
(9, 03, 3, 2),
(10, 01, 4, 1)
;
select *, row_number() over (partition by Grp order by ID) as GrpID From (
select ID, Col1, [GroupTotal],
sum(case when Col1 = '01' then 1 else 0 end) over (Order by ID) as Grp,
[GroupID]
from #tmp
The sum handles the groups with case, 1 is added always when Col1=01, and that's then used in the row_number to partition the groups.
Example
I'm not really sure what you are after but you are on the right tracks with partitioning functions. The following calculates a running total of groupid by grouptotal. I'm sure that's not what you want but it shows you how you can achieve it.
select *, SUM(GroupId) over (partition by grouptotal order by id)
from #tmp
order by grouptotal, id

Return inserted rows in specified order for Sql Server

I need to return the rows from a table-valued insert in the same order they were specified.
I can use OUTPUT INTO
DECLARE #generated1 TABLE ([Id] varbinary(8), [OwnerId] [int]);
INSERT INTO [Blog] ([OwnerId])
OUTPUT INSERTED.*
INTO #generated1
VALUES ('1'),
('2'),
('1'),
('2'),
('2'),
('3'),
('3'),
('3'),
('3');
SELECT * FROM #generated1;
This usually works and returns
Id OwnerId
===========================
0x418B6EC7C6AC864D 1
0x6D0B89E56AB3EC48 2
0xE1B86C6A3C64AB42 1
0x51B8D9D1FCDE1647 2
0xB5AD578020CBCE4C 2
0x56CD3FF610080841 3
0x1D0D5B370A732C43 3
0x0B71CDB5CE6E0445 3
0x6A8AE3A2BD19924E 3
But if there is an FK defined on OwnerId and more than 125 rows are inserted the order in which they are inserted is different from the specified order.
One way this could be accomplished is by adding a sequential value to each row to be inserted, joining the generated table with the values that were specified and ordering by the added sequential value:
DECLARE #inserted1 TABLE ([Order] [int], [OwnerId] [int]);
INSERT INTO #inserted1
VALUES ('1', '1'),
('2', '2'),
('3', '1'),
('4', '3'),
('5', '2'),
('6', '3'),
('7', '3'),
('8', '2'),
('9', '3');
DECLARE #generated1 TABLE ([Id] varbinary(8), [OwnerId] [int]);
INSERT INTO [Blog] ([OwnerId])
OUTPUT INSERTED.[Id], INSERTED.[OwnerId]
INTO #generated1
SELECT [OwnerId] FROM #inserted1;
SELECT *
FROM (SELECT [g].[Id], [g].[OwnerId], [i].[Order]
FROM #generated1 [g]
INNER JOIN #inserted1 [i]
ON [g].[OwnerId] = [i].[OwnerId]) t
ORDER BY [Order];
But since OwnerId is non-unique this will produce more rows than inserted:
Id OwnerId Order
0x2557DCF354F9CD4E 1 1
0x3A265F70A2018249 1 1
0xA21503CD2F928144 2 2
0xE8C593480FCEAF41 2 2
0xC3E3C969BEA87641 2 2
0x2557DCF354F9CD4E 1 3
0x3A265F70A2018249 1 3
0x3F7EBD8EE702B44B 3 4
0xA3F09A3A612ACF41 3 4
0xA45D8F6FF779A74C 3 4
0x7BA9521290232D43 3 4
0xA21503CD2F928144 2 5
0xE8C593480FCEAF41 2 5
0xC3E3C969BEA87641 2 5
0x3F7EBD8EE702B44B 3 6
0xA3F09A3A612ACF41 3 6
0xA45D8F6FF779A74C 3 6
0x7BA9521290232D43 3 6
0x3F7EBD8EE702B44B 3 7
0xA3F09A3A612ACF41 3 7
0xA45D8F6FF779A74C 3 7
0x7BA9521290232D43 3 7
0xA21503CD2F928144 2 8
0xE8C593480FCEAF41 2 8
0xC3E3C969BEA87641 2 8
0x3F7EBD8EE702B44B 3 9
0xA3F09A3A612ACF41 3 9
0xA45D8F6FF779A74C 3 9
0x7BA9521290232D43 3 9
There are still only 9 unique values in Id and Order columns, the way they are combined shouldn't matter however, since the only value that identifies the row is OwnerId. The trick is to remove the rows from the result in a way that only 9 rows are returned with values in Id and Order being unique. Partitioning on both of the columns provides a way to deterministically order the combinations:
SELECT *
FROM (SELECT [g].[Id], [g].[OwnerId], [i].[Order],
ROW_NUMBER() OVER (PARTITION BY [g].[Id] ORDER BY [i].[Order]) AS RowNumber,
ROW_NUMBER() OVER (PARTITION BY [i].[Order] ORDER BY [g].[Id]) AS RowNumber2
FROM #generated1 [g]
INNER JOIN #inserted1 [i]
ON [g].[OwnerId] = [i].[OwnerId]) t
WHERE RowNumber = RowNumber2
ORDER BY [Order];
This returns the rows in the expected order:
Id OwnerId Order RowNumber RowNumber2
======================================================
0x2A51E4E35D2FA040 1 1 1 1
0x787E303904EC764C 2 2 1 1
0x778CE142E9760248 1 3 2 2
0xC056C57F1729E643 3 4 1 1
0xC0706FF6A8890E40 2 5 2 2
0x0E2058F3F142DF42 3 6 2 2
0x4690B24BE196374B 3 7 3 3
0x9F70CA6011ECD449 2 8 3 3
0xF35D87D1BDB2C34F 3 9 4 4

Resources