Strange query plan on max(date) query on a View - sql-server

I have a view which comprises 4 yearly tables:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[BGT_BETWAYDETAILS]
WITH SCHEMABINDING
AS
SELECT [bwd_BetTicketNr] ,
[bwd_LineID] [int] ,
[bwd_ResultID] [bigint] NOT NULL,
[bwd_DateModified] ,
[bwd_DateModifiedTrunc] ,
[bwd_LineMaxPayout]
FROM [dbo].[BGT_BETWAYDETAILS_2020]
UNION ALL
SELECT [bwd_BetTicketNr] ,
[bwd_LineID] [int] ,
[bwd_DateModified] ,
[bwd_DateModifiedTrunc] ,
[bwd_LineMaxPayout]
FROM [dbo].[BGT_BETWAYDETAILS_2019]
UNION ALL
SELECT [bwd_BetTicketNr] ,
[bwd_LineID] [int] ,
[bwd_DateModified] ,
[bwd_DateModifiedTrunc] ,
[bwd_LineMaxPayout]
FROM [dbo].[BGT_BETWAYDETAILS_2018]
UNION ALL
SELECT [bwd_BetTicketNr] ,
[bwd_LineID] [int] ,
[bwd_DateModified] ,
[bwd_DateModifiedTrunc] ,
[bwd_LineMaxPayout]
FROM [dbo].[BGT_BETWAYDETAILS_2017];
GO
Each table has the following structure:
CREATE TABLE [dbo].[BGT_BETWAYDETAILS_2020]
(
[bwd_BetTicketNr] [bigint] NOT NULL,
[bwd_LineID] [int] NOT NULL,
[bwd_ResultID] [bigint] NOT NULL,
[bwd_DateModified] [datetime] NULL,
[bwd_DateModifiedTrunc] [date] NULL,
[bwd_LineMaxPayout] [decimal](18, 4) NULL,
CONSTRAINT [CSTR__BGT_BETWAYDETAILS_2020_CKEY]
PRIMARY KEY CLUSTERED ([bwd_BetTicketNr] ASC, [bwd_LineID] ASC, [bwd_ResultID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
I have added an non-clustered index on
CREATE NONCLUSTERED INDEX [NCI__DATEMODIFIED]
ON [dbo].[BGT_BETWAYDETAILS_2020] ([bwd_DateModifiedTrunc] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
I am running the following 3 queries:
SELECT COALESCE(MAX([bwd_DateModifiedTrunc]), '2019-01-01') AS next_date
FROM [dbo].[BGT_BETWAYDETAILS_2020]
SELECT COALESCE(MAX([bwd_DateModifiedTrunc]), '2019-01-01') AS next_date
FROM [dbo].[BGT_BETWAYDETAILS]
SELECT COALESCE(CAST(MAX([bwd_DateModified]) AS date), '2019-01-01') AS next_date
FROM [dbo].[BGT_BETWAYDETAILS]
The first one, when run on each yearly table, runs instantly.
The second one, seems to take forever. The query plan for this, seems very strange.
The plan shows two index scans on each yearly table.
The plan for each yearly table is what I expected to see:
Finally, the plan on the non-indexed date column is also what I expected to see (a clustered index scan). A clustered index scan on each table. This query runs in ~3mins which is expected.
What is the issue here? Some anti-pattern I am missing? Why the index scan on the non-clustered index is done 2 times according to the live plan? I expected the view to respond as fast as the individual tables.
For the record, I am running this on SQL Server 2017.

This just looks like an optimiser limitation. I have submitted a suggestion that this should be improved.
A simpler example is
CREATE TABLE T1(X INT NULL UNIQUE CLUSTERED);
CREATE TABLE T2(X INT NULL UNIQUE CLUSTERED);
INSERT INTO T1
OUTPUT INSERTED.X INTO T2
SELECT TOP 100000 NULLIF(ROW_NUMBER() OVER (ORDER BY 1/0),1)
FROM sys.all_objects o1,
sys.all_objects o2;
And then
WITH CTE AS
(
SELECT X FROM T1
UNION ALL
SELECT X FROM T2
)
SELECT MAX(X)
FROM CTE
OPTION (QUERYRULEOFF ScalarGbAggToTop)
This disables the query optimizer rule ScalarGbAggToTop and the query plan does a MAX on each individual table then computes a MAX of the MAX-es - so the same as
SELECT MAX(MaxX)
FROM
(
SELECT MAX(X) AS MaxX FROM T1
UNION ALL
SELECT MAX(X) AS MaxX FROM T1
) T
With the ScalarGbAggToTop rule enabled the plan now looks like this
It is effectively doing the following...
SELECT MAX(MaxX)
FROM (SELECT MAX(X) AS MaxX
FROM (SELECT TOP 1 X
FROM T1
WHERE X IS NULL
UNION ALL
SELECT TOP 1 X
FROM T1
WHERE X IS NOT NULL
ORDER BY X DESC) T1
UNION ALL
SELECT MAX(X) AS MaxX
FROM (SELECT TOP 1 X
FROM T2
WHERE X IS NULL
UNION ALL
SELECT TOP 1 X
FROM T2
WHERE X IS NOT NULL
ORDER BY X DESC) T2) T0
... but in a very inefficient way. Running the SQL above would give a plan with seeks and each branch only reading a single row.
The plan produced by ScalarGbAggToTop only has minimal changes to the stream aggregate plan. It looks like it takes the scan from that and applies a backwards ordering to it and then uses the backwards ordering for both the NOT NULL and NULL branches. And does not perform any additional exploration to see if there is a more efficient access path.
This means that in the pathological case that all of the rows are either NULL or NOT NULL one of the scans will end up reading all of the rows in the table (5 billion in your case if applicable to all 4 tables). Even if there is a mix of NULL and NOT NULL the fact that the IS NULL branch is doing a backwards scan is sub optimal because NULL is ordered first in SQL Server so would be at the beginning of the index.
The addition of a NOT NULL branch in the first place seems largely unnecessary as the query would return the same results without it. I imagine it is only needed so that it knows whether or not to display the message
Warning: Null value is eliminated by an aggregate or other SET
operation.
but I doubt you care about that. In which case adding an explicit WHERE ... NOT NULL resolves the issue.
WITH CTE AS
(
SELECT X FROM T1
UNION ALL
SELECT X FROM T2
)
SELECT MAX(X)
FROM CTE
WHERE X IS NOT NULL
;
It now has a seek into the NOT NULL part of the index and reads backwards (stopping after the first row is read from each table)

Related

Azure SQL Database - Indexing 10+ millions rows

I have database hosted on Azure SQL Database and below is the schema for a single table:
CREATE TABLE [dbo].[Article](
[ArticleHash] [bigint] NOT NULL,
[FeedHash] [bigint] NOT NULL,
[PublishedOn] [datetime] NOT NULL,
[ExpiresOn] [datetime] NOT NULL,
[DateCreated] [datetime] NOT NULL,
[Url] [nvarchar](max) NULL,
[Title] [nvarchar](max) NULL,
[Summary] [nvarchar](max) NULL
CONSTRAINT [PK_dbo.Article] PRIMARY KEY CLUSTERED
(
[ArticleHash] ASC,
[FeedHash] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
I have a few queries which I'm executing that are really slow since this table contains over 10 million records:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY PublishedOn DESC) page_rn, *
FROM Article
WHERE (FeedHash = -8498408432858355421 AND ExpiresOn > '2016-01-18 14:18:04.970')
) paged
WHERE page_rn>0 AND page_rn<=21
And one more:
SELECT ArticleHash
FROM Article
WHERE (FeedHash = -8498408432858355421
AND ArticleHash IN (-1776401574438488264,996871668263687248,-5186412434178204433,6410875610077852481,-5428137965544411137,-5326808411357670185,2738089298373692963,9180394103094543689,8120572317154347382,-369910952783360989,1071631911959711259,1187953785740614613,6665010324256449533,3720795027036815325,-5458296665864077096,-5832860214011872788,-2941009192514997875,334202794706549486,-5579819992060984166,-696086851747657853,-7466754676679718482,-1461835507954240474,9021713212273098604,-6337379666850984216,5502287921912059432)
AND ExpiresOn >= '2016-01-18 14:28:25.883')
What is the best way to index this table so that queries execute below 300 ms? Is it even possible on such big table? The Azure SQL Database edition is S3.
Also, a lot of DELETE/INSERT actions are performed on this table so any indexes should not affect performance of these...
First query would benefit from native pagination with OFFSET and FETCH:
SELECT *
FROM Article
WHERE FeedHash = -8498408432858355421 AND ExpiresOn > '2016-01-18 14:18:04.970'
ORDER BY PublishedOn DESC
OFFSET 0 FETCH NEXT 20 ROWS ONLY
The second query might benefit from substituting IN list with INNER JOIN of a table:
DECLARE #ArticleHashList AS TABLE (ArticleHashWanted bigint PRIMARY KEY);
INSERT INTO #ArticleHashList (ArticleHashWanted) VALUES
(-1776401574438488264),
( 996871668263687248),
(-5186412434178204433),
( 6410875610077852481),
(-5428137965544411137),
(-5326808411357670185),
( 2738089298373692963),
( 9180394103094543689),
( 8120572317154347382),
( -369910952783360989),
( 1071631911959711259),
( 1187953785740614613),
( 6665010324256449533),
( 3720795027036815325),
(-5458296665864077096),
(-5832860214011872788),
(-2941009192514997875),
( 334202794706549486),
(-5579819992060984166),
( -696086851747657853),
(-7466754676679718482),
(-1461835507954240474),
( 9021713212273098604),
(-6337379666850984216),
( 5502287921912059432);
SELECT ArticleHash
FROM Article
INNER JOIN #ArticleHashList On ArticleHash = ArticleHashWanted
WHERE FeedHash = -8498408432858355421 AND ExpiresOn >= '2016-01-18 14:28:25.883';
Creating indexes on dates should help a lot:
CREATE INDEX idx_Article_PublishedOn ON Article (PublishedOn);
CREATE INDEX idx_Article_ExpiresOn ON Article (ExpiresOn);
for first query I recomend this index:
create index ix_Article_FeedHash_ExpiresOn_withInclude on Article(FeedHash,ExpiresOn) include ( DateCreated, PublishedOn, Url, Title, Summary)
and second query shoud use clustered index seek, you must look at Actul Execution Plan what happends. Also I think you have bad clustered index because valuse looks like not growing but has to be random and probably index is very fragmented, you could check it with query
select * from sys.dm_db_index_physical_stats(db_id(), object_id('Article'), null, null, 'DETAILED');
if avg_fragmentation_in_percent is between 5 and 30 then you can fix it by
alter index [clustered index name] on Article reorganize;
if avg_fragmentation_in_percent is higher then 30 then you can fix it by
alter index [clustered index name] on Article rebuild;
(if after reorganize nothing changes then you could try rebuild)

t-sql 2012 update foreign key value in primary table

In a special request run, I need to update Locker and Lock tables in a sql server 2012 database, I have the following 2 table definitiions:
CREATE TABLE [dbo].[Locker](
[lockerID] [int] IDENTITY(1,1) NOT NULL,
[schoolID] [int] NOT NULL,
[number] [varchar](10) NOT NULL,
[lockID] [int] NULL
CONSTRAINT [PK_Locker] PRIMARY KEY NONCLUSTERED
(
[lockerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97)
ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Lock](
[lockID] [int] IDENTITY(1,1) NOT NULL,
[schoolID] [int] NOT NULL,
[comboSeq] [tinyint] NOT NULL
CONSTRAINT [PK_Lock] PRIMARY KEY NONCLUSTERED
(
[lockID] ASC
)
The Locker table is the main table and the Lock table is the secondary table. I need to add 500 new Locker numbers that the user has given to me to place in the Locker table and is uniquely defined by LockerID. I also need to add 500 new rows to the correspsonding Lock table that is uniquely defined in the Lock table and identified by the lockid.
Since lockid is a key value in the lock table and is uniquely defined in the locker table, I would like to know how to update the lock table with the 500 new rows. I would then like to take value of lockid (from lock table for the 500 new rows that were created) and uniquely place those 500 lockids uniquely into the 500 rows that were created for the lock table.
I have sql that looks like the following so far:
declare #SchoolID int = 999
insert into test.dbo.Locker ( [schoolID], [number])
select distinct LKR.schoolID, A.lockerNumber
FROM [InputTable] A
JOIN test.dbo.School SCH
ON A.schoolnumber = SCH.type
and A.schoolnumber = #SchoolNumber
JOIN test.dbo.Locker LKR
ON SCH.schoolID = LKR.schoolID
AND A.lockerNumber not in (select number
from dbo.Locker
where schoolID = #SchoolID)
order by LKR.schoolID, A.lockerNumber
I am not certain how to complete the rest of the task of placing lockerid uniquely into lock and locker tables? Thus can you either modify the sql that I just listed above and/or
come up with some new sql that will show me how to accomplish my goal?
You should use OUTPUT statement. First you should add rows to the lock table then gram lockid and prepare insert to locker table. This shoul meet your expectations:
DECLARE #tmp TABLE (lockid INT)
INSERT dbo.Lock
( schoolID, comboSeq )
OUTPUT Inserted.lockID INTO #tmp ( lockid )
(SELECT
999,
1
FROM master..spt_values sv WHERE sv.type = 'P' AND sv.number <= 500);
INSERT INTO dbo.Locker( schoolID, number, lockID )
SELECT x.schoolID, x.lockerNumber, y.lockid
FROM
(
SELECT TOP 100 PERCENT DISCTINCT LKR.schoolID, A.lockerNumber, ROW_NUMBER() OVER (ORDER BY A.LockerNumber) rn
FROM [InputTable] A
JOIN test.dbo.School SCH ON A.schoolnumber = SCH.type
and A.schoolnumber = #SchoolNumber
JOIN test.dbo.Locker LKR ON SCH.schoolID = LKR.schoolID
AND A.lockerNumber not in (select number from dbo.Locker
WHERE schoolID = #SchoolID)
ORDER by LKR.schoolID, A.lockerNumber ) x
JOIN (SELECT t.lockid, ROW_NUMBER() OVER (ORDER BY t.lockid) AS rn FROM #tmp t
) y
ON x.rn = y.rn

join multiple nvarchar columns

I have a table like this:
CREATE TABLE [dbo].[Table](
[Id] [INT] IDENTITY(1,1) NOT NULL,
[A] [NVARCHAR](150) NULL,
[B] [NVARCHAR](150) NULL,
[C] [NVARCHAR](150) NULL,
[D] [NVARCHAR](150) NULL,
[E] [NVARCHAR](150) NULL,
CONSTRAINT [con] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
and look for performance inprovements to join this table.
Option 1 - Combine all string into nvarchar primary key and then do:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Table.PKString
To my knowledge this is bad practice.
Option 2 - Use:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Target.[A] + Target.[B] + Target.[C] + Target.[D] + Target.[E]
Option 3 - Use:
Source.[A] = Target.[A] And
...
Source.[E] = Target.[E]
Your option 1 won't work correctly as it will treat ('ab','c') as equal to ('a','bc').
Also your columns are nullable and concatenating null yields null.
You can't combine all columns into an nvarchar primary key due to nullability and even without that you would still be at risk of failure as the max length would be 1,500 bytes which is well over the max index key column size.
For similar reasons of length a composite index using all columns also wouldn't work.
You could create a computed column that uses all those 5 column values as input to calculate a checksum or hash value and index that however.
ALTER TABLE [dbo].[Table]
ADD HashValue AS CAST(hashbytes('SHA1', ISNULL([A], '') + ISNULL([B], '')+ ISNULL([C], '')+ ISNULL([D], '')+ ISNULL([E], '')) AS VARBINARY(20));
CREATE INDEX ix
ON [dbo].[Table](HashValue)
INCLUDE ([A], [B], [C], [D], [E])
Then use that in the join with a residual predicate on the other 5 columns in case of hash collisions.
If you want NULL to compare equal you could use
SELECT *
FROM [dbo].[Table1] source
JOIN [dbo].[Table2] target
ON source.HashValue = target.HashValue
AND EXISTS(SELECT source.A,
source.B,
source.C,
source.D,
source.E
INTERSECT
SELECT target.A,
target.B,
target.C,
target.D,
target.E)
Note the index created above basically reproduces the whole table so you might want to consider creating as clustered instead if your queries need it to be covering.

SQL Sub-Select using primary Key

I'm looking to retrieve values from a sub-select query on a rowset where I only want values from the current row of the main query (SQL Server versions 2005-2012 have shown this same behavior). I've written a sub-select query which is returning multiple rows (HOW, I'm matching on the primary KEY?!)
The following example code illustrates what I'm trying to accomplish:
CREATE TABLE [dbo].[TestTable]
(
[TestID] [int] IDENTITY(1,1) NOT NULL,
[TestValue] [nvarchar](255) NULL,
[TestValue2] [nvarchar](255) NULL,
[TestValue3] [nvarchar](255) NULL,
[TestValue4] [nvarchar](255) NULL,
PRIMARY KEY CLUSTERED ([TestID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [TESTDB].[dbo].[TestTable] ([TestValue], [TestValue2], [TestValue3],[TestValue4])
VALUES('01234', '56789', '98765', '43210')
GO
INSERT INTO [TESTDB].[dbo].[TestTable] ([TestValue], [TestValue2], [TestValue3],[TestValue4])
VALUES('01234', '98765', '56789', '43210')
GO
INSERT INTO [TESTDB].[dbo].[TestTable] ([TestValue], [TestValue2], [TestValue3],[TestValue4])
VALUES('01234', '43210', '56789' ,'98765')
GO
INSERT INTO [TESTDB].[dbo].[TestTable] ([TestValue], [TestValue2], [TestValue3],[TestValue4])
VALUES('01234', '98765', '43210', '56789')
GO
SELECT TOP 1000
[TestID]
,[TestValue]
,[TestValue2]
,[TestValue3]
,(SELECT TestValue + TestValue2 AS CompositeValue
FROM [TESTDB].[dbo].TestTable AS foo
WHERE foo.TestID = TestID)
FROM
[TESTDB].[dbo].[TestTable]
Error being returned is:
Msg 512, Level 16, State 1, Line 2
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
If you can offer an alternate way of performing this query - i.e. using ROW_NUMBER or some other method (without performing a select into a temporary table, and without declaring individual variables).
Thanks in advance!
Try to alias the base table
SELECT TOP 1000 [TestID]
,[TestValue]
,[TestValue2]
,[TestValue3]
,(SELECT foo.TestValue + foo.TestValue2 FROM [TESTDB].[dbo].TestTable AS foo WHERE foo.TestID=B.TestID) AS CompositeValue
FROM [TESTDB].[dbo].[TestTable] AS B

SQL Server Views - Index Usage [duplicate]

I have an indexed view but when I run queries on that view the index which is built on View is not applied and the query runs without index. Below is my dummy script:
Tables + View+ Index on View
CREATE TABLE P_Test
(
[PID] INT IDENTITY,
[TID] INT,
[StatusID] INT
)
CREATE TABLE T_Test
(
[TID] INT IDENTITY,
[FID] INT,
)
CREATE TABLE F_Test
(
[FID] INT IDENTITY,
[StatusID] INT
)
GO
INSERT INTO F_Test
SELECT TOP 1000 ABS(CAST(NEWID() AS BINARY(6)) %10) --below 100
FROM master..spt_values
INSERT INTO T_Test
SELECT TOP 10000 ABS(CAST(NEWID() AS BINARY(6)) %1000) --below 1000
FROM master..spt_values,
master..spt_values v2
INSERT INTO P_Test
SELECT TOP 100000 ABS(CAST(NEWID() AS BINARY(6)) %10000) --below 10000
,
ABS(CAST(NEWID() AS BINARY(6)) %10)--below 10
FROM master..spt_values,
master..spt_values v2
GO
CREATE VIEW [TestView]
WITH SCHEMABINDING
AS
SELECT P.StatusID AS PStatusID,
F.StatusID AS FStatusID,
P.PID
FROM dbo.P_Test P
INNER JOIN dbo.T_Test T
ON T.TID = P.TID
INNER JOIN dbo.F_Test F
ON T.FID = F.FID
GO
CREATE UNIQUE CLUSTERED INDEX [PK_TestView]
ON [dbo].[TestView] ( [PStatusID] ASC, [FStatusID] ASC, [PID] ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now when I run the following queries the [PK_TestView] index is not being applied:
SELECT PStatusID ,
FStatusID ,
PID FROM [TestView]
SELECT PStatusID ,
FStatusID ,
PID FROM [TestView]
WHERE [PStatusID]=1
SELECT COUNT(PStatusID) FROM [TestView]
WHERE [PStatusID]=1
Can you help me fixing this?
You need to use the NOEXPAND hint. SQL Server will not consider matching indexed views without this (even if the view name is referenced in the query) unless you are on Enterprise Edition engine.
SELECT COUNT(PStatusID)
FROM [TestView]
WITH (NOEXPAND) -- this line
WHERE [PStatusID]=1
This should give you the first, much cheaper, plan

Resources