Why does the TOP function not function as I expect in SQL? - sql-server

I thought I knew how the TOP function works, but with this code below I'm not sure.
Please can someone tell me why it returns fruit:b ordinal:9, instead of the expected fruit:b ordinal:8?
;WITH CTE
AS (
SELECT fruit, ordinal, row_number() OVER (
ORDER BY (
SELECT 1
)
) AS rn
FROM (
VALUES (1, 'a'), (2, 'b'), (3, 'b'), (4, 'c'), (5, 'c'), (6, 'a'), (7, 'a'), (8, 'b'), (9, 'b')
) fruits(ordinal, fruit)
), CTE2
AS (
SELECT fruit, ordinal
FROM cte AS cteouter
WHERE rn = 1
OR fruit != (
SELECT fruit
FROM cte AS cteinner
WHERE cteinner.rn = cteouter.rn - 1
)
)
--SELECT * FROM CTE2
SELECT TOP 1 *
FROM cte2
ORDER BY ordinal DESC

row_number() OVER (ORDER BY (SELECT 1)) (as well as over (order by 1) or over (order by 1/0)) does not guarantee stable reproducible numbering of the incoming rows. Quite opposite, it effectively switches off the order by clause and makes it random.
When you run the top query with TOP 1, you get one execution plan, and when without TOP, you get another. These plans happen to randomly result in a different ordering of the rows in CTE, which in turn changes which rows are returned from CTE2.

Related

CTE - LEFT OUTER JOIN Performance Problem

Using SQL Server 2017.
SQL FIDDLE: LINK
CREATE TABLE [TABLE_1]
(
PLAN_NR decimal(28,6) NULL,
START_DATE datetime NULL,
);
CREATE TABLE [TABLE_2]
(
PLAN_NR decimal(28,6) NULL,
PERIOD_NR decimal(28,6) NULL,
);
INSERT INTO TABLE_1 (PLAN_NR, START_DATE)
VALUES (1, '2020-05-01'), (2, '2020-08-05');
INSERT INTO TABLE_2 (PLAN_NR, PERIOD_NR)
VALUES (1, 1), (1, 2), (1, 5), (1, 6), (1, 5), (1, 6), (1, 17),
(2, 2), (2, 3), (2, 5), (2, 2), (2, 17), (2, 28);
CREATE VIEW ALL_PERIODS
AS
WITH rec_cte AS
(
SELECT
PLAN_NR, START_DATE,
1 period_nr, DATEADD(day, 7, START_DATE) next_date
FROM
TABLE_1
UNION ALL
SELECT
PLAN_NR, next_date,
period_nr + 1, DATEADD(day, 7, next_date)
FROM
rec_cte
WHERE
period_nr < 100
),
cte1 AS
(
SELECT
PLAN_NR, period_nr, START_DATE
FROM
rec_cte
UNION ALL
SELECT
PLAN_NR, period_nr, DATEADD(DAY, 1, EOMONTH(next_date, -1))
FROM
rec_cte
WHERE
MONTH(START_DATE) <> MONTH(next_date)
),
cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PLAN_NR ORDER BY START_DATE) rn
FROM cte1
)
SELECT PLAN_NR, rn PERIOD_NR, START_DATE
FROM cte2
WHERE rn <= 100
Table_1 lists plans (PLAN_NR) and their start date (START_DATE).
Table_2 lists plan numbers (PLAN_NR) and periods (1 - X). Per plan number periods can appear several times but can also be missing.
A period lasts seven days, unless the period includes a change of month. Then the period is divided into a part before the end of the month and a part after the end of the month.
The view ALL_PERIODS lists 100 periods per plan according to this system.
My problem is the performance of the following select which I would like to use in a view:
SELECT
t2.PLAN_NR
, t2.PERIOD_NR
, a_p.START_DATE
from TABLE_2 as t2
left outer join ALL_PERIODS a_p on t2.PERIOD_NR = a_p.PERIOD_NR and t2.PLAN_NR = a_p.PLAN_NR
From about 4000 entries in TABLE_2 the select becomes incredibly slow.
The join itself does not yet slow down the query. Only with the additional select a_p.START_DATE everything becomes incredibly slow.
I read the view into a temporary table and did the join over that and got no performance issues. (2 seconds for the 4000 entries).
So I assume that the CTE used in the view is the reason for the slow performance.
Unfortunately I can't use temporary tables in views and I would hate to write the data to a normal table.
Is there a way in SQL Server to improve the CTE lag?
Instead of a recusive CTE, generate ALL_PERIODS with a CROSS join between the Plan table and a "number table" either persisted, or as a non-recursive CTE.
EG
WITH N As
(
select top 100 row_number() over (order by (select null)) i
from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) v1(i),
(values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) v2(i)
),
plan_period AS
(
SELECT
PLAN_NR, START_DATE,
N.i period_nr, DATEADD(day, 7*N.i, START_DATE) next_date
FROM TABLE_1 CROSS JOIN N
),
if you are able to modify the view I would recommend to do this :
add a table containing numbers starting from 0 to whatever you think you will need in database, you can use below command :
create table numbers ( id int)
go
;with cte (
select 0 num
union all
select num + 1
where num < 2000 -- change this
)
insert into number
from num from cte
change the first cte in the view to this :
WITH rec_cte AS
(
SELECT
PLAN_NR
, DATEADD(DAY, 7* id, START_DATE) START_DATE
, id +1 period_nr
, DATEADD(DAY, 7*( id+1), START_DATE) next_date
FROM
TABLE_1 t
CROSS apply intenum i
WHERE i.id <100
),...
Also consider using temp table instead of cte it might be helpful

Get the Average of a Datediff function using a partition by in Snowflake

I am looking to understand what the average amount of days between transactions is for each of the customers in my database using Snowflake.
select Customer_ID,Day_ID,
datediff(Day,lag(Day_ID) over (Partition by Customer_ID ORDER BY DAY_ID), DAY_ID) as Time_Since
from Table
order by Customer_ID, Day_ID
The code above works to get me the time_elapsed but when I try to add an average function I get an error:
select Customer_ID
avg(datediff(Day,lag(Day_ID) over (Partition by Customer_ID ORDER BY DAY_ID), DAY_ID)) as AVG_Time_Since
from Table
order by Customer_ID
group by Customer_ID
The error reads:
SQL compilation error: Window function [LAG(TABLE.DAY_ID) OVER (PARTITION BY TABLE.CUSTOMER_ID ORDER BY TABLE.DAY_ID ASC NULLS LAST)] may not appear inside an aggregate function.
Any ideas?
You can nest them and get the answer you're seeking.
Note: You can simply delete the cte from the beginning of this and replace from cte with from YourTable
WITH cte as
(SELECT column1 customer_id, column2::date day_id
FROM
VALUES (1, '2019-01-01'), (1, '2019-01-06'), (1, '2019-01-15'), (1, '2019-01-25'), (1, '2019-01-27'), (1, '2019-01-31'), (2, '2019-01-01'), (2, '2019-01-08'), (2, '2019-01-13'), (2, '2019-01-17'), (2, '2019-01-21'), (2, '2019-01-25'), (2, '2019-02-02'), (3, '2019-02-12'), (3, '2019-02-14'), (3, '2019-02-18'), (3, '2019-02-23'), (3, '2019-03-04'), (3, '2019-03-10'))
SELECT customer_id,
avg(time_since) AVG_Time_Since
FROM
(SELECT Customer_ID,
Day_ID,
datediff(DAY, lag(Day_ID) OVER (PARTITION BY Customer_ID
ORDER BY DAY_ID), DAY_ID) AS Time_Since
FROM cte
ORDER BY Customer_ID,
Day_ID)
GROUP BY customer_id ;

how to select and join same table in mssql [duplicate]

I have a simple categories table as with the following columns:
Id
Name
ParentId
So, an infinite amount of Categories can be the child of a category. Take for example the following hierarchy:
I want, in a simple query that returns the category "Business Laptops" to also return a column with all it's parents, comma separator or something:
Or take the following example:
Recursive cte to the rescue....
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T as table
(
id int,
name varchar(100),
parent_id int
)
INSERT INTO #T VALUES
(1, 'A', NULL),
(2, 'A.1', 1),
(3, 'A.2', 1),
(4, 'A.1.1', 2),
(5, 'B', NULL),
(6, 'B.1', 5),
(7, 'B.1.1', 6),
(8, 'B.2', 5),
(9, 'A.1.1.1', 4),
(10, 'A.1.1.2', 4)
The cte:
;WITH CTE AS
(
SELECT id, name, name as path, parent_id
FROM #T
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.name, cast(cte.path +','+ t.name as varchar(100)), t.parent_id
FROM #T t
INNER JOIN CTE ON t.parent_id = CTE.id
)
The query:
SELECT id, name, path
FROM CTE
Results:
id name path
1 A A
5 B B
6 B.1 B,B.1
8 B.2 B,B.2
7 B.1.1 B,B.1,B.1.1
2 A.1 A,A.1
3 A.2 A,A.2
4 A.1.1 A,A.1,A.1.1
9 A.1.1.1 A,A.1,A.1.1,A.1.1.1
10 A.1.1.2 A,A.1,A.1.1,A.1.1.2
See online demo on rextester

How to find max of one column of all child in SQL

I have a chart like picture , that store it in table with KID , ParentID .
how can i get max MR for all child under parent.
example : for Node C ----> max ( MR(D) , MR(E) , MR(F) )
How can find Max(MR) for all child of node?
DECLARE #a TABLE
(
KID INT PRIMARY KEY,
ParentID INT,
MR INT
)
INSERT INTO #a (KID, ParentID, MR)
VALUES
(1, 0, 3), (2, 1, 1), (3, 1, 3),
(4, 3, 3), (5, 3, 5), (6, 5, 3)
;WITH cte AS
(
SELECT *
FROM #a
WHERE ParentID = 3
UNION ALL
SELECT t2.*
FROM cte t1
JOIN #a t2 ON t1.ParentID = t2.KID
)
SELECT MAX(MR)
FROM cte
OPTION (MAXRECURSION 0)
result -
5
Maybe you can use over clause
SELECT
ParentID,
MAX(MR) OVER(PARTITION BY ParentID)
FROM
Table

Having trouble with grouping rows - MS SQL 2008

I have a product table which has some duplicate records.
I need to get primarykeys atfer grouped them according to names and types
DECLARE #Products TABLE
(
pkProductId INT,
productName NVARCHAR(500),
productType INT
)
INSERT INTO #Products (pkProductId, productName, productType)
VALUES
(1, 'iphone', 0),
(2, 'iphone', 0),
(3, 'iphone', 1),
(4, 'iphone', 1),
(5, 'iphone', 1)
After I run like tsql
SELECT pr.pkProductId FROM #Products pr
GROUP BY pr.productName, pr.productType
HAVING COUNT(pr.productName) > 1
I Want To Get These IDs
pkProductId
---------------
2
4
5
Thank You For Your Hepls :)
You could use row_number() to get the result:
select pkProductId
from
(
select pkProductId,
productName,
productType,
row_number() over(partition by productName, productType order by pkproductId) rn
from #Products
) d
where rn >1;
See SQL Fiddle with Demo

Resources